(quick-start-distillation)=

# Quick Start - Distillation

```{image} https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/distillation.ipynb
```

This guide demonstrates how to pretrain a model on **unlabeled data** with
distillation. Distillation is a special form of pretraining where a large,
pretrained teacher model, like DINOv2 or DINOv3, is used to guide the training of
a smaller student model. This is the ideal starting point if you want to improve
performance of any model that is not already a large vision foundation model, like
YOLO, ConvNet, or special transformer architectures.

The quick start covers the following steps:

1. Install Lightly**Train**
1. Prepare your unlabeled dataset
1. Pretrain a model with distillation
1. Fine-tune the pretrained model on a downstream task

## Installation

```bash
pip install lightly-train
```

```{important}
Lightly**Train** is officially supported on:
- Linux: CPU or CUDA
- MacOS: CPU only
- Windows (experimental): CPU or CUDA

We are planning to support MPS for MacOS.

Check the [installation instructions](installation.md#installation) for more
details.
```

## Prepare Data

You can use any image dataset for training. No labels are required, and the
dataset can be structured in any way, including subdirectories. If you don't have
a dataset at hand, you can download an example dataset:

```bash
wget https://github.com/lightly-ai/coco128_unlabeled/releases/download/v0.0.1/coco128_unlabeled.zip && unzip -q coco128_unlabeled.zip
```

See the [data guide](pretrain-data) for more information on
supported data formats.

In this example, the dataset looks like this:

```text
coco128_unlabeled
└── images
    ├── 000000000009.jpg
    ├── 000000000025.jpg
    ├── ...
    └── 000000000650.jpg
```

## Pretrain with Distillation

Once the data is ready, you can pretrain the model like this:

```python
import lightly_train

# Pretrain the model
lightly_train.pretrain(
    out="out/my_experiment",  # Output directory
    data="coco128_unlabeled",  # Directory with images
    model="dinov3/vitt16",  # Model to train
    method="distillation",  # Pretraining method
    method_args={
        "teacher": "dinov3/vits16"  # Teacher model for distillation
    },
    epochs=5,  # Small number of epochs for demonstration
    batch_size=32,  # Small batch size for demonstration
)
```

```{note}
This is a minimal example for illustration purposes. In practice you would want to use a
larger dataset (>=10'000 images), more epochs (>=100, ideally ~1000), and a larger
batch size (>=128). For pretraining larger models than `dinov3/vitt16` we also recommend
using a larger teacher model and setting `method="distillationv1"`.
```

```{tip}
Lightly**Train** supports many [popular models](pretrain_distill/models/index.md)
out of the box.
```

This pretrains a tiny DINOv3 ViT model using images from `coco128_unlabeled`. All training
logs, model exports, and checkpoints are saved to the output directory at
`out/my_experiment`.

Once the training is complete, the `out/my_experiment` directory contains the
following files:

```text
out/my_experiment
├── checkpoints
│   ├── epoch=03-step=123.ckpt          # Intermediate checkpoint
│   └── last.ckpt                       # Last checkpoint
├── events.out.tfevents.123.0           # Tensorboard logs
├── exported_models
|   └── exported_last.pt                # Final model exported
├── metrics.jsonl                       # Training metrics
└── train.log                           # Training logs
```

The final model is exported to `out/my_experiment/exported_models/exported_last.pt`
in the default format of the used library. It can directly be used for fine-tuning. See
[export format](pretrain_distill/export.md#format) for more information on how to
export models to other formats or on how to export intermediate checkpoints.

While the trained model has already learned good representations of the images, it
cannot yet make any predictions for tasks such as classification, detection, or
segmentation. To solve these tasks, the model needs to be fine-tuned on a labeled
dataset.

## Fine-Tune

Now the model is ready for fine-tuning! You can use your favorite library for this
step. We'll use Lightly**Train**'s built-in fine-tuning for object detection as an
example.

### Prepare Labeled Data

A labeled dataset is required for fine-tuning. You can download an example dataset
from here:

```bash
wget https://github.com/lightly-ai/coco128_yolo/releases/download/v0.0.1/coco128_yolo.zip && unzip -q coco128_yolo.zip
```

The dataset looks like this after the download completes:

```text
coco128_yolo
├── images
│   ├── train2017
│   │   ├── 000000000009.jpg
│   │   ├── 000000000025.jpg
│   │   ├── ...
│   │   └── 000000000650.jpg
│   └── val2017
│       ├── 000000000139.jpg
│       ├── 000000000285.jpg
│       ├── ...
│       └── 000000013201.jpg
└── labels
    ├── train2017
    │   ├── 000000000009.txt
    │   ├── 000000000025.txt
    │   ├── ...
    │   └── 000000000659.txt
    └── val2017
        ├── 000000000139.txt
        ├── 000000000285.txt
        ├── ...
        └── 000000013201.txt
```

### Fine-Tune the Pretrained Model

Once the dataset is ready, you can fine-tune the pretrained model like this:

```python
import lightly_train

lightly_train.train_object_detection(
    out="out/my_finetune_experiment",
    model="dinov3/vitt16-ltdetr",
    model_args={
        # Load the pretrained weights.
        "backbone_weights": "out/my_experiment/exported_models/exported_last.pt",
    },
    steps=100,  # Small number of steps for demonstration, default is 90_000.
    batch_size=4,  # Small batch size for demonstration, default is 16.
    data={
        "path": "coco128_yolo",
        "train": "images/train2017",
        "val": "images/val2017",
        "names": {
            0: "person",
            1: "bicycle",
            2: "car",
            3: "motorcycle",
            4: "airplane",
            5: "bus",
            6: "train",
            7: "truck",
            8: "boat",
            9: "traffic light",
            10: "fire hydrant",
            11: "stop sign",
            12: "parking meter",
            13: "bench",
            14: "bird",
            15: "cat",
            16: "dog",
            17: "horse",
            18: "sheep",
            19: "cow",
            20: "elephant",
            21: "bear",
            22: "zebra",
            23: "giraffe",
            24: "backpack",
            25: "umbrella",
            26: "handbag",
            27: "tie",
            28: "suitcase",
            29: "frisbee",
            30: "skis",
            31: "snowboard",
            32: "sports ball",
            33: "kite",
            34: "baseball bat",
            35: "baseball glove",
            36: "skateboard",
            37: "surfboard",
            38: "tennis racket",
            39: "bottle",
            40: "wine glass",
            41: "cup",
            42: "fork",
            43: "knife",
            44: "spoon",
            45: "bowl",
            46: "banana",
            47: "apple",
            48: "sandwich",
            49: "orange",
            50: "broccoli",
            51: "carrot",
            52: "hot dog",
            53: "pizza",
            54: "donut",
            55: "cake",
            56: "chair",
            57: "couch",
            58: "potted plant",
            59: "bed",
            60: "dining table",
            61: "toilet",
            62: "tv",
            63: "laptop",
            64: "mouse",
            65: "remote",
            66: "keyboard",
            67: "cell phone",
            68: "microwave",
            69: "oven",
            70: "toaster",
            71: "sink",
            72: "refrigerator",
            73: "book",
            74: "clock",
            75: "vase",
            76: "scissors",
            77: "teddy bear",
            78: "hair drier",
            79: "toothbrush",
        },
    },
)
```

This will load the pretrained model from
`out/my_experiment/exported_models/exported_last.pt` and fine-tune it on a subset
of the labeled COCO dataset for 100 steps.

Congratulations! You've just trained and fine-tuned a model using
Lightly**Train**!

## Generate Embeddings

Instead of fine-tuning the model, you can also use it to generate image embeddings.
This is useful for clustering, retrieval, or visualization tasks. The `embed`
command generates embeddings for all images in a directory:

```python
import lightly_train

lightly_train.embed(
    out="my_embeddings.pth",  # Exported embeddings
    checkpoint="out/my_experiment/checkpoints/last.ckpt",  # LightlyTrain checkpoint
    data="coco128_unlabeled",  # Directory with images
)
```

The embeddings are saved to `my_embeddings.pth`. Let's load them and take a look:

```python
import torch

embeddings = torch.load("my_embeddings.pth")
print("First five filenames:")
print(embeddings["filenames"][:5])  # Print first five filenames
print("\nEmbedding tensor shape:")
print(
    embeddings["embeddings"].shape
)  # Tensor with embeddings with shape (num_images, embedding_dim)
```

## Next Steps

- [Object Detection Quick Start](quick-start-object-detection): If you want to
  learn more about fine-tuning and how to use the fine-tuned model for inference.
- [Distillation Guide](pretrain-distill): If you want to learn more about
  distillation and how to pretrain any model with it.
- [DINOv2 Pretraining](methods-dinov2): If you want to learn how to pretrain
  foundation models.