Quick Start - Distillation¶
This guide demonstrates how to pretrain a model on unlabeled data with distillation. Distillation is a special form of pretraining where a large, pretrained teacher model, like DINOv2 or DINOv3, is used to guide the training of a smaller student model. This is the ideal starting point if you want to improve performance of any model that is not already a large vision foundation model, like YOLO, ConvNet, or special transformer architectures.
The quick start covers the following steps:
Install LightlyTrain
Prepare your unlabeled dataset
Pretrain a model with distillation
Fine-tune the pretrained model on a downstream task
Installation¶
pip install lightly-train
Important
LightlyTrain is officially supported on:
Linux: CPU or CUDA
MacOS: CPU only
Windows (experimental): CPU or CUDA
We are planning to support MPS for MacOS.
Check the installation instructions for more details.
Prepare Data¶
You can use any image dataset for training. No labels are required, and the dataset can be structured in any way, including subdirectories. If you don’t have a dataset at hand, you can download an example dataset:
wget https://github.com/lightly-ai/coco128_unlabeled/releases/download/v0.0.1/coco128_unlabeled.zip && unzip -q coco128_unlabeled.zip
See the data guide for more information on supported data formats.
In this example, the dataset looks like this:
coco128_unlabeled
└── images
├── 000000000009.jpg
├── 000000000025.jpg
├── ...
└── 000000000650.jpg
Pretrain with Distillation¶
Once the data is ready, you can pretrain the model like this:
import lightly_train
# Pretrain the model
lightly_train.pretrain(
out="out/my_experiment", # Output directory
data="coco128_unlabeled", # Directory with images
model="dinov3/vitt16", # Model to train
method="distillation", # Pretraining method
method_args={
"teacher": "dinov3/vits16" # Teacher model for distillation
},
epochs=5, # Small number of epochs for demonstration
batch_size=32, # Small batch size for demonstration
)
Note
This is a minimal example for illustration purposes. In practice you would want to use a
larger dataset (>=10’000 images), more epochs (>=100, ideally ~1000), and a larger
batch size (>=128). For pretraining larger models than dinov3/vitt16 we also recommend
using a larger teacher model and setting method="distillationv1".
Tip
LightlyTrain supports many popular models out of the box.
This pretrains a tiny DINOv3 ViT model using images from coco128_unlabeled. All training
logs, model exports, and checkpoints are saved to the output directory at
out/my_experiment.
Once the training is complete, the out/my_experiment directory contains the
following files:
out/my_experiment
├── checkpoints
│ ├── epoch=03-step=123.ckpt # Intermediate checkpoint
│ └── last.ckpt # Last checkpoint
├── events.out.tfevents.123.0 # Tensorboard logs
├── exported_models
| └── exported_last.pt # Final model exported
├── metrics.jsonl # Training metrics
└── train.log # Training logs
The final model is exported to out/my_experiment/exported_models/exported_last.pt
in the default format of the used library. It can directly be used for fine-tuning. See
export format for more information on how to
export models to other formats or on how to export intermediate checkpoints.
While the trained model has already learned good representations of the images, it cannot yet make any predictions for tasks such as classification, detection, or segmentation. To solve these tasks, the model needs to be fine-tuned on a labeled dataset.
Fine-Tune¶
Now the model is ready for fine-tuning! You can use your favorite library for this step. We’ll use LightlyTrain’s built-in fine-tuning for object detection as an example.
Prepare Labeled Data¶
A labeled dataset is required for fine-tuning. You can download an example dataset from here:
wget https://github.com/lightly-ai/coco128_yolo/releases/download/v0.0.1/coco128_yolo.zip && unzip -q coco128_yolo.zip
The dataset looks like this after the download completes:
coco128_yolo
├── images
│ ├── train2017
│ │ ├── 000000000009.jpg
│ │ ├── 000000000025.jpg
│ │ ├── ...
│ │ └── 000000000650.jpg
│ └── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ ├── ...
│ └── 000000013201.jpg
└── labels
├── train2017
│ ├── 000000000009.txt
│ ├── 000000000025.txt
│ ├── ...
│ └── 000000000659.txt
└── val2017
├── 000000000139.txt
├── 000000000285.txt
├── ...
└── 000000013201.txt
Fine-Tune the Pretrained Model¶
Once the dataset is ready, you can fine-tune the pretrained model like this:
import lightly_train
lightly_train.train_object_detection(
out="out/my_finetune_experiment",
model="dinov3/vitt16-ltdetr",
model_args={
# Load the pretrained weights.
"backbone_weights": "out/my_experiment/exported_models/exported_last.pt",
},
steps=100, # Small number of steps for demonstration, default is 90_000.
batch_size=4, # Small batch size for demonstration, default is 16.
data={
"path": "coco128_yolo",
"train": "images/train2017",
"val": "images/val2017",
"names": {
0: "person",
1: "bicycle",
2: "car",
3: "motorcycle",
4: "airplane",
5: "bus",
6: "train",
7: "truck",
8: "boat",
9: "traffic light",
10: "fire hydrant",
11: "stop sign",
12: "parking meter",
13: "bench",
14: "bird",
15: "cat",
16: "dog",
17: "horse",
18: "sheep",
19: "cow",
20: "elephant",
21: "bear",
22: "zebra",
23: "giraffe",
24: "backpack",
25: "umbrella",
26: "handbag",
27: "tie",
28: "suitcase",
29: "frisbee",
30: "skis",
31: "snowboard",
32: "sports ball",
33: "kite",
34: "baseball bat",
35: "baseball glove",
36: "skateboard",
37: "surfboard",
38: "tennis racket",
39: "bottle",
40: "wine glass",
41: "cup",
42: "fork",
43: "knife",
44: "spoon",
45: "bowl",
46: "banana",
47: "apple",
48: "sandwich",
49: "orange",
50: "broccoli",
51: "carrot",
52: "hot dog",
53: "pizza",
54: "donut",
55: "cake",
56: "chair",
57: "couch",
58: "potted plant",
59: "bed",
60: "dining table",
61: "toilet",
62: "tv",
63: "laptop",
64: "mouse",
65: "remote",
66: "keyboard",
67: "cell phone",
68: "microwave",
69: "oven",
70: "toaster",
71: "sink",
72: "refrigerator",
73: "book",
74: "clock",
75: "vase",
76: "scissors",
77: "teddy bear",
78: "hair drier",
79: "toothbrush",
},
},
)
This will load the pretrained model from
out/my_experiment/exported_models/exported_last.pt and fine-tune it on a subset
of the labeled COCO dataset for 100 steps.
Congratulations! You’ve just trained and fine-tuned a model using LightlyTrain!
Generate Embeddings¶
Instead of fine-tuning the model, you can also use it to generate image embeddings.
This is useful for clustering, retrieval, or visualization tasks. The embed
command generates embeddings for all images in a directory:
import lightly_train
lightly_train.embed(
out="my_embeddings.pth", # Exported embeddings
checkpoint="out/my_experiment/checkpoints/last.ckpt", # LightlyTrain checkpoint
data="coco128_unlabeled", # Directory with images
)
The embeddings are saved to my_embeddings.pth. Let’s load them and take a look:
import torch
embeddings = torch.load("my_embeddings.pth")
print("First five filenames:")
print(embeddings["filenames"][:5]) # Print first five filenames
print("\nEmbedding tensor shape:")
print(
embeddings["embeddings"].shape
) # Tensor with embeddings with shape (num_images, embedding_dim)
Next Steps¶
Object Detection Quick Start: If you want to learn more about fine-tuning and how to use the fine-tuned model for inference.
Distillation Guide: If you want to learn more about distillation and how to pretrain any model with it.
DINOv2 Pretraining: If you want to learn how to pretrain foundation models.