DINOv2¶
This page describes how to use DINOv2 models with LightlyTrain.
DINOv2 models are Vision Transformers (ViTs) pretrained by Meta using the DINOv2 self-supervised learning method on large image datasets. They serve as high-quality feature extractors and strong backbones for downstream tasks such as object detection, segmentation, and image classification.
Note
DINOv2 models are released under the Apache 2.0 license.
Pretrain and Fine-tune a DINOv2 Model¶
Pretrain¶
DINOv2 models can be pretrained from scratch or starting from Meta’s pretrained weights
using the DINOv2 method. Below we provide the minimum scripts using
dinov2/vitb14 as an example:
import lightly_train
if __name__ == "__main__":
lightly_train.pretrain(
out="out/my_experiment", # Output directory.
data="my_data_dir", # Directory with images.
model="dinov2/vitb14", # Pass the DINOv2 model.
method="dinov2", # Use the DINOv2 pretraining method.
)
lightly-train pretrain out="out/my_experiment" data="my_data_dir" model="dinov2/vitb14" method="dinov2"
See DINOv2 method for more details on the pretraining method and its configuration options.
Fine-tune¶
After pretraining, the exported DINOv2 backbone can be loaded into downstream task
models via the backbone_weights argument. Refer to the following pages for fine-tuning
instructions and example code:
Object Detection — fine-tune a DINOv2-based LTDETR model; supports loading custom pretrained backbone weights via
backbone_weights.Semantic Segmentation — fine-tune a DINOv2-based EoMT model; supports loading custom pretrained backbone weights via
backbone_weights.Instance Segmentation — fine-tune a DINOv2-based EoMT model.
Panoptic Segmentation — fine-tune a DINOv2-based EoMT model.
Image Classification — fine-tune a DINOv2 backbone for classification.
Supported Models¶
Pretrained Models¶
The following models are pretrained by Meta and loaded automatically when used.
dinov2/vits14dinov2/vitb14dinov2/vitl14dinov2/vitg14
Not Pretrained Models¶
The following models start from random initialization and are useful when pretraining from scratch with the DINOv2 method on a custom dataset without starting from Meta’s weights.
dinov2/vits14-notpretraineddinov2/vitb14-notpretraineddinov2/vitl14-notpretraineddinov2/vitg14-notpretrained