(methods-dinov2)=

# DINOv2 (beta 🔬)

DINOv2 is a state-of-the-art self-supervised learning method for training vision
foundation models. It is optimized for large-scale models and datasets.
DINOv2 pretrained models are effective across a wide range of tasks, including
image classification, object detection, and segmentation. They are also known to
generate high-quality features that can be used without fine-tuning the model.

## Use DINOv2 in LightlyTrain

````{tab} Python
```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="dinov2_vit/vitb14_pretrain",
        method="dinov2",
        method_args={
            # Only set these arguments when starting from a pretrained model
            "student_freeze_backbone_epochs": 1,  # Freeze the student backbone for 1 epoch
            "student_freeze_last_layer_epochs": 0,  # Unfreeze the student last layer
        },
    )
````

````{tab} Command Line
```bash
lightly-train train out=out/my_experiment data=my_data_dir model="dinov2_vit/vitb14_pretrain" method="dinov2"
```
````

The following models are available for DINOv2 pretraining:

- `dinov2_vit/vits14`
- `dinov2_vit/vits14_pretrain`
- `dinov2_vit/vitb14`
- `dinov2_vit/vitb14_pretrain`
- `dinov2_vit/vitl14`
- `dinov2_vit/vitl14_pretrain`
- `dinov2_vit/vitg14`
- `dinov2_vit/vitg14_pretrain`

Models with a `_pretrain` suffix are [pretrained by Meta](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models).

````{note}
When starting from a pretrained model we highly recommend to set the
`student_freeze_backbone_epochs` and `student_freeze_last_layer_epochs` arguments:

```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="dinov2_vit/vitb14_pretrain",
        method="dinov2",
        method_args={
            "student_freeze_backbone_epochs": 1,  # Freeze the student backbone for 1 epoch
            "student_freeze_last_layer_epochs": 0,  # Unfreeze the student last layer
        },
    )
```

The reason for this is that the pretrained models only contain weights for the backbone
but not the head. Freezing the backbone for the first epoch allows the model to
initialize the head weights based on the pretrained backbone.

If you start from scratch, then do **not** set these arguments.
````

## What's under the Hood

DINOv2 combines the strengths of DINO and iBOT, two previous self-supervised learning
methods. Following DINO, it trains a student network to match the output of a
momentum-averaged teacher network without labels. It also incorporates the masked
image modeling loss from iBOT, which helps the model learn strong local semantic
features.

## Lightly Recommendations

- **Models**: DINOv2 can only be used with ViTs. If you want to use a different model,
  we recommend first pretraining a ViT with DINOv2 and then distilling the knowledge
  of the ViT into your model of choice with the [distillation method](methods-distillation).
- **Batch Size**: We recommend somewhere around 3072 for DINOv2 as the original paper
  suggested.
- **Number of Epochs**: We recommend somewhere between 100 to 300 epochs. However,
  DINOv2 benefits from longer schedules and may still improve after training for more
  than 300 epochs.
- **Large Datasets**: DINOv2 is optimized for large datasets. We recommend at least
  1 million images for training from scratch.

## Default Method Arguments

The following are the default method arguments for DINOv2. To learn how you can
override these settings, see {ref}`method-args`.

````{dropdown} Default Method Arguments
```{include} _auto/dinov2_method_args.md
```
````

## Default Image Transform Arguments

The following are the default transform arguments for DINOv2. To learn how you can
override these settings, see {ref}`method-transform-args`.

````{dropdown} Default Image Transforms
```{include} _auto/dinov2_transform_args.md
```
````