DINOv3¶
This page describes how to use DINOv3 models with LightlyTrain.
DINOv3 models are Vision Transformers (ViTs) and ConvNeXt models pretrained by Meta using the DINOv3 self-supervised learning method on the large-scale LVD-1689M dataset. They are state-of-the-art vision foundation models and serve as strong backbones for downstream tasks such as object detection, segmentation, and image classification.
Note
DINOv3 models are released under the DINOv3 license. Use DINOv2 models instead for a more permissive Apache 2.0 license.
Pretrain and Fine-tune a DINOv3 Model¶
Pretrain¶
DINOv3 ViT-T/16 models (dinov3/vitt16 and dinov3/vitt16plus) are efficient tiny
models trained by Lightly using the distillation method with
DINOv3 ViT-L/16 as the teacher on ImageNet-1K. They are not part of Meta’s official
DINOv3 release but follow the same architecture. The ViT-T architecture is based on the
design proposed in Touvron et al., 2022.
You can distill your own DINOv3 ViT-T/16 model from DINOv3 ViT-L/16 on your custom dataset as follows:
import lightly_train
if __name__ == "__main__":
lightly_train.pretrain(
out="out/my_experiment", # Output directory.
data="my_data_dir", # Directory with images.
model="dinov3/vitt16", # Student: DINOv3 ViT-T/16.
method="distillation",
method_args={
"teacher": "dinov3/vitl16", # Teacher: DINOv3 ViT-L/16.
},
)
lightly-train pretrain out="out/my_experiment" data="my_data_dir" model="dinov3/vitt16" method="distillation" method_args.teacher="dinov3/vitl16"
See Distillation method for more details on the pretraining method and its configuration options.
Fine-tune¶
DINOv3 models come with high-quality pretrained weights from Meta and can be used
directly as fine-tuning backbones without additional pretraining. After pretraining on a
custom dataset, the exported backbone can also be loaded via the backbone_weights
argument. Refer to the following pages for fine-tuning instructions and example code:
Object Detection — fine-tune a DINOv3-based LTDETR model; supports loading custom pretrained backbone weights via
backbone_weights(see Pretrain and Fine-tune).Semantic Segmentation — fine-tune a DINOv3-based EoMT model; supports loading custom pretrained backbone weights via
backbone_weights(see Pretrain and Fine-tune).Instance Segmentation — fine-tune a DINOv3-based EoMT model.
Panoptic Segmentation — fine-tune a DINOv3-based EoMT model.
Image Classification — fine-tune a DINOv3 backbone for classification.
Supported Models¶
ViT Models¶
The following ViT models are supported. The LVD-1689M and SAT-493M models are pretrained by Meta and are under the DINOv3 license. The EUPE models are pretrained by Meta using the EUPE method and are under the FAIR Noncommercial Research License. The ViT-T/16 models, except the EUPE one, are trained by Lightly using knowledge distillation from DINOv3 ViT-L/16.
ViT-T (Lightly, distilled from DINOv3 ViT-L/16 on ImageNet-1K)
dinov3/vitt16— distillationv2 weights; recommended for dense tasks (object detection, segmentation)dinov3/vitt16plus— distillationv2 weights; recommended for dense tasksdinov3/vitt16-distillationv1— distillationv1 weights; recommended for global tasks (image classification)dinov3/vitt16plus-distillationv1— distillationv1 weights; recommended for global tasksdinov3/vitt16-notpretrained— random initialization; for training from scratchdinov3/vitt16plus-notpretrained— random initialization; for training from scratch
ViT-T (Meta, LVD-1689M)
dinov3/vitt16-eupe- EUPE weights
ViT-S (Meta, LVD-1689M)
dinov3/vits16dinov3/vits16-eupe- EUPE weightsdinov3/vits16plus
ViT-B (Meta, LVD-1689M)
dinov3/vitb16dinov3/vitb16-eupe- EUPE weights
ViT-L (Meta)
dinov3/vitl16(LVD-1689M)dinov3/vitl16-sat493m(SAT-493M)
ViT-H (Meta, LVD-1689M)
dinov3/vith16plus
ViT-7B (Meta)
dinov3/vit7b16(LVD-1689M)dinov3/vit7b16-sat493m(SAT-493M)
ConvNeXt Models¶
The following ConvNeXt models are supported. All are pretrained by Meta on the LVD-1689M dataset. The DINOv3 models are under the DINOv3 license. The EUPE models are pretrained by Meta using the EUPE method and are under the FAIR Noncommercial Research License.
dinov3/convnext-tinydinov3/convnext-tiny-eupe- EUPE weightsdinov3/convnext-smalldinov3/convnext-small-eupe- EUPE weightsdinov3/convnext-basedinov3/convnext-base-eupe- EUPE weightsdinov3/convnext-large