(methods-dinov2)=

# DINOv2 (beta 🔬)

DINOv2 is a state-of-the-art self-supervised learning method for training vision
foundation models. It is optimized for large-scale models and datasets.
DINOv2 pretrained models are effective across a wide range of tasks, including
image classification, object detection, and segmentation. They are also known to
generate high-quality features that can be used without fine-tuning the model.

## Use DINOv2 in LightlyTrain

````{tab} Python
```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="dinov2_vit/vitb14_pretrain",
        method="dinov2",
    )
````

````{tab} Command Line
```bash
lightly-train train out=out/my_experiment data=my_data_dir model="dinov2_vit/vitb14_pretrain" method="dinov2"
```
````

The following models are available for DINOv2 pretraining:

- `dinov2_vit/vits14`
- `dinov2_vit/vits14_pretrain`
- `dinov2_vit/vitb14`
- `dinov2_vit/vitb14_pretrain`
- `dinov2_vit/vitl14`
- `dinov2_vit/vitl14_pretrain`
- `dinov2_vit/vitg14`
- `dinov2_vit/vitg14_pretrain`

Models with a `_pretrain` suffix are [pretrained by Meta](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models).

## What's under the Hood

DINOv2 combines the strengths of DINO and iBOT, two previous self-supervised learning
methods. Following DINO, it trains a student network to match the output of a
momentum-averaged teacher network without labels. It also incorporates the masked
image modeling loss from iBOT, which helps the model learn strong local semantic
features.

## Lightly Recommendations

- **Models**: DINOv2 can only be used with ViTs. If you want to use a different model,
  we recommend first pretraining a ViT with DINOv2 and then distilling the knowledge
  of the ViT into your model of choice with the [distillation method](methods-distillation).
- **Batch Size**: We recommend somewhere around 3072 for DINOv2 as the original paper
  suggested.
- **Number of Epochs**: We recommend somewhere between 100 to 300 epochs. However,
  DINOv2 benefits from longer schedules and may still improve after training for more
  than 300 epochs.
- **Large Datasets**: DINOv2 is optimized for large datasets. We recommend at least
  1 million images for training from scratch.

## Default Method Arguments

The following are the default method arguments for DINOv2. To learn how you can
override these settings, see {ref}`method-args`.

````{dropdown} Default Method Arguments
```{include} _auto/dinov2_method_args.md
```
````

## Default Image Transform Arguments

The following are the default transform arguments for DINOv2. To learn how you can
override these settings, see {ref}`method-transform-args`.

````{dropdown} Default Image Transforms
```{include} _auto/dinov2_transform_args.md
```
````