DINOv2¶

This page describes how to use DINOv2 models with LightlyTrain.

DINOv2 models are Vision Transformers (ViTs) pretrained by Meta using the DINOv2 self-supervised learning method on large image datasets. They serve as high-quality feature extractors and strong backbones for downstream tasks such as object detection, segmentation, and image classification.

Note

DINOv2 models are released under the Apache 2.0 license.

Pretrain and Fine-tune a DINOv2 Model¶

Pretrain¶

DINOv2 models can be pretrained from scratch or starting from Meta’s pretrained weights using the DINOv2 method. Below we provide the minimum scripts using dinov2/vitb14 as an example:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.pretrain(
        out="out/my_experiment",                # Output directory.
        data="my_data_dir",                     # Directory with images.
        model="dinov2/vitb14",                  # Pass the DINOv2 model.
        method="dinov2",                        # Use the DINOv2 pretraining method.
    )

Command Line

lightly-train pretrain out="out/my_experiment" data="my_data_dir" model="dinov2/vitb14" method="dinov2"

See DINOv2 method for more details on the pretraining method and its configuration options.

Fine-tune¶

After pretraining, the exported DINOv2 backbone can be loaded into downstream task models via the backbone_weights argument. Refer to the following pages for fine-tuning instructions and example code:

Object Detection — fine-tune a DINOv2-based LTDETR model; supports loading custom pretrained backbone weights via backbone_weights.
Semantic Segmentation — fine-tune a DINOv2-based EoMT model; supports loading custom pretrained backbone weights via backbone_weights.
Instance Segmentation — fine-tune a DINOv2-based EoMT model.
Panoptic Segmentation — fine-tune a DINOv2-based EoMT model.
Image Classification — fine-tune a DINOv2 backbone for classification.

Supported Models¶

Pretrained Models¶

The following models are pretrained by Meta and loaded automatically when used.

dinov2/vits14
dinov2/vitb14
dinov2/vitl14
dinov2/vitg14

Not Pretrained Models¶

The following models start from random initialization and are useful when pretraining from scratch with the DINOv2 method on a custom dataset without starting from Meta’s weights.

dinov2/vits14-notpretrained
dinov2/vitb14-notpretrained
dinov2/vitl14-notpretrained
dinov2/vitg14-notpretrained