Quick Start - Distillation¶

This guide demonstrates how to pretrain a model on unlabeled data with distillation. Distillation is a special form of pretraining where a large, pretrained teacher model, like DINOv2 or DINOv3, is used to guide the training of a smaller student model. This is the ideal starting point if you want to improve performance of any model that is not already a large vision foundation model, like YOLO, ConvNet, or special transformer architectures.

The quick start covers the following steps:

Install LightlyTrain
Prepare your unlabeled dataset
Pretrain a model with distillation
Fine-tune the pretrained model on a downstream task

Installation¶

pip install lightly-train

Important

LightlyTrain is officially supported on:

Linux: CPU or CUDA
MacOS: CPU only
Windows (experimental): CPU or CUDA

We are planning to support MPS for MacOS.

Check the installation instructions for more details.

Prepare Data¶

You can use any image dataset for training. No labels are required, and the dataset can be structured in any way, including subdirectories. If you don’t have a dataset at hand, you can download an example dataset:

wget https://github.com/lightly-ai/coco128_unlabeled/releases/download/v0.0.1/coco128_unlabeled.zip && unzip -q coco128_unlabeled.zip

See the data guide for more information on supported data formats.

In this example, the dataset looks like this:

coco128_unlabeled
└── images
    ├── 000000000009.jpg
    ├── 000000000025.jpg
    ├── ...
    └── 000000000650.jpg

Pretrain with Distillation¶

Once the data is ready, you can pretrain the model like this:

import lightly_train

# Pretrain the model
lightly_train.pretrain(
    out="out/my_experiment",  # Output directory
    data="coco128_unlabeled",  # Directory with images
    model="dinov3/vitt16",  # Model to train
    method="distillation",  # Pretraining method
    method_args={
        "teacher": "dinov3/vits16"  # Teacher model for distillation
    },
    epochs=5,  # Small number of epochs for demonstration
    batch_size=32,  # Small batch size for demonstration
)

Note

This is a minimal example for illustration purposes. In practice you would want to use a larger dataset (>=10’000 images), more epochs (>=100, ideally ~1000), and a larger batch size (>=128). For pretraining larger models than dinov3/vitt16 we also recommend using a larger teacher model and setting method="distillationv1".

Tip

LightlyTrain supports many popular models out of the box.

This pretrains a tiny DINOv3 ViT model using images from coco128_unlabeled. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment.

Once the training is complete, the out/my_experiment directory contains the following files:

out/my_experiment
├── checkpoints
│   ├── epoch=03-step=123.ckpt          # Intermediate checkpoint
│   └── last.ckpt                       # Last checkpoint
├── events.out.tfevents.123.0           # Tensorboard logs
├── exported_models
|   └── exported_last.pt                # Final model exported
├── metrics.jsonl                       # Training metrics
└── train.log                           # Training logs

The final model is exported to out/my_experiment/exported_models/exported_last.pt in the default format of the used library. It can directly be used for fine-tuning. See export format for more information on how to export models to other formats or on how to export intermediate checkpoints.

While the trained model has already learned good representations of the images, it cannot yet make any predictions for tasks such as classification, detection, or segmentation. To solve these tasks, the model needs to be fine-tuned on a labeled dataset.

Fine-Tune¶

Now the model is ready for fine-tuning! You can use your favorite library for this step. We’ll use LightlyTrain’s built-in fine-tuning for object detection as an example.

Prepare Labeled Data¶

A labeled dataset is required for fine-tuning. You can download an example dataset from here:

wget https://github.com/lightly-ai/coco128_yolo/releases/download/v0.0.1/coco128_yolo.zip && unzip -q coco128_yolo.zip

The dataset looks like this after the download completes:

coco128_yolo
├── images
│   ├── train2017
│   │   ├── 000000000009.jpg
│   │   ├── 000000000025.jpg
│   │   ├── ...
│   │   └── 000000000650.jpg
│   └── val2017
│       ├── 000000000139.jpg
│       ├── 000000000285.jpg
│       ├── ...
│       └── 000000013201.jpg
└── labels
    ├── train2017
    │   ├── 000000000009.txt
    │   ├── 000000000025.txt
    │   ├── ...
    │   └── 000000000659.txt
    └── val2017
        ├── 000000000139.txt
        ├── 000000000285.txt
        ├── ...
        └── 000000013201.txt

Fine-Tune the Pretrained Model¶

Once the dataset is ready, you can fine-tune the pretrained model like this:

import lightly_train

lightly_train.train_object_detection(
    out="out/my_finetune_experiment",
    model="dinov3/vitt16-ltdetr",
    model_args={
        # Load the pretrained weights.
        "backbone_weights": "out/my_experiment/exported_models/exported_last.pt",
    },
    steps=100,  # Small number of steps for demonstration, default is 90_000.
    batch_size=4,  # Small batch size for demonstration, default is 16.
    data={
        "path": "coco128_yolo",
        "train": "images/train2017",
        "val": "images/val2017",
        "names": {
            0: "person",
            1: "bicycle",
            2: "car",
            3: "motorcycle",
            4: "airplane",
            5: "bus",
            6: "train",
            7: "truck",
            8: "boat",
            9: "traffic light",
            10: "fire hydrant",
            11: "stop sign",
            12: "parking meter",
            13: "bench",
            14: "bird",
            15: "cat",
            16: "dog",
            17: "horse",
            18: "sheep",
            19: "cow",
            20: "elephant",
            21: "bear",
            22: "zebra",
            23: "giraffe",
            24: "backpack",
            25: "umbrella",
            26: "handbag",
            27: "tie",
            28: "suitcase",
            29: "frisbee",
            30: "skis",
            31: "snowboard",
            32: "sports ball",
            33: "kite",
            34: "baseball bat",
            35: "baseball glove",
            36: "skateboard",
            37: "surfboard",
            38: "tennis racket",
            39: "bottle",
            40: "wine glass",
            41: "cup",
            42: "fork",
            43: "knife",
            44: "spoon",
            45: "bowl",
            46: "banana",
            47: "apple",
            48: "sandwich",
            49: "orange",
            50: "broccoli",
            51: "carrot",
            52: "hot dog",
            53: "pizza",
            54: "donut",
            55: "cake",
            56: "chair",
            57: "couch",
            58: "potted plant",
            59: "bed",
            60: "dining table",
            61: "toilet",
            62: "tv",
            63: "laptop",
            64: "mouse",
            65: "remote",
            66: "keyboard",
            67: "cell phone",
            68: "microwave",
            69: "oven",
            70: "toaster",
            71: "sink",
            72: "refrigerator",
            73: "book",
            74: "clock",
            75: "vase",
            76: "scissors",
            77: "teddy bear",
            78: "hair drier",
            79: "toothbrush",
        },
    },
)

This will load the pretrained model from out/my_experiment/exported_models/exported_last.pt and fine-tune it on a subset of the labeled COCO dataset for 100 steps.

Congratulations! You’ve just trained and fine-tuned a model using LightlyTrain!

Generate Embeddings¶

Instead of fine-tuning the model, you can also use it to generate image embeddings. This is useful for clustering, retrieval, or visualization tasks. The embed command generates embeddings for all images in a directory:

import lightly_train

lightly_train.embed(
    out="my_embeddings.pth",  # Exported embeddings
    checkpoint="out/my_experiment/checkpoints/last.ckpt",  # LightlyTrain checkpoint
    data="coco128_unlabeled",  # Directory with images
)

The embeddings are saved to my_embeddings.pth. Let’s load them and take a look:

import torch

embeddings = torch.load("my_embeddings.pth")
print("First five filenames:")
print(embeddings["filenames"][:5])  # Print first five filenames
print("\nEmbedding tensor shape:")
print(
    embeddings["embeddings"].shape
)  # Tensor with embeddings with shape (num_images, embedding_dim)

Next Steps¶

Object Detection Quick Start: If you want to learn more about fine-tuning and how to use the fine-tuned model for inference.
Distillation Guide: If you want to learn more about distillation and how to pretrain any model with it.
DINOv2 Pretraining: If you want to learn how to pretrain foundation models.