(models-dinov2)= # DINOv2 This page describes how to use DINOv2 models with LightlyTrain. [DINOv2](https://github.com/facebookresearch/dinov2) models are Vision Transformers (ViTs) pretrained by Meta using the DINOv2 self-supervised learning method on large image datasets. They serve as high-quality feature extractors and strong backbones for downstream tasks such as object detection, segmentation, and image classification. ```{note} DINOv2 models are released under the [Apache 2.0 license](https://github.com/facebookresearch/dinov2/blob/main/LICENSE). ``` ## Pretrain and Fine-tune a DINOv2 Model ### Pretrain DINOv2 models can be pretrained from scratch or starting from Meta's pretrained weights using the [DINOv2 method](#methods-dinov2). Below we provide the minimum scripts using `dinov2/vitb14` as an example: ````{tab} Python ```python import lightly_train if __name__ == "__main__": lightly_train.pretrain( out="out/my_experiment", # Output directory. data="my_data_dir", # Directory with images. model="dinov2/vitb14", # Pass the DINOv2 model. method="dinov2", # Use the DINOv2 pretraining method. ) ``` ```` ````{tab} Command Line ```bash lightly-train pretrain out="out/my_experiment" data="my_data_dir" model="dinov2/vitb14" method="dinov2" ```` See [DINOv2 method](#methods-dinov2) for more details on the pretraining method and its configuration options. ### Fine-tune After pretraining, the exported DINOv2 backbone can be loaded into downstream task models via the `backbone_weights` argument. Refer to the following pages for fine-tuning instructions and example code: - [Object Detection](#object-detection-pretrain-finetune) — fine-tune a DINOv2-based LTDETR model; supports loading custom pretrained backbone weights via `backbone_weights`. - [Semantic Segmentation](#semantic-segmentation-pretrain-finetune) — fine-tune a DINOv2-based EoMT model; supports loading custom pretrained backbone weights via `backbone_weights`. - [Instance Segmentation](#instance-segmentation) — fine-tune a DINOv2-based EoMT model. - [Panoptic Segmentation](#panoptic-segmentation) — fine-tune a DINOv2-based EoMT model. - [Image Classification](#image-classification) — fine-tune a DINOv2 backbone for classification. ## Supported Models ### Pretrained Models The following models are [pretrained by Meta](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models) and loaded automatically when used. - `dinov2/vits14` - `dinov2/vitb14` - `dinov2/vitl14` - `dinov2/vitg14` ### Not Pretrained Models The following models start from random initialization and are useful when pretraining from scratch with the [DINOv2 method](#methods-dinov2) on a custom dataset without starting from Meta's weights. - `dinov2/vits14-notpretrained` - `dinov2/vitb14-notpretrained` - `dinov2/vitl14-notpretrained` - `dinov2/vitg14-notpretrained`