.. _dino: DINO ==== DINO (Distillation with No Labels) [0]_ is a self-supervised learning framework for visual representation learning using knowledge distillation. It trains a student network to match the output of a momentum-averaged teacher network without requiring labeled data. DINO uses a self-distillation objective with a cross-entropy loss between the student and teacher outputs, avoiding the need for contrastive pairs. Key elements include centering and sharpening mechanisms to stabilize training, multi-crop augmentation for efficient learning, and the ability to learn semantically meaningful features without supervision. DINO achieves strong performance on image clustering, segmentation, and zero-shot transfer tasks, demonstrating the emergence of object-centric representations. Key Components -------------- - **Data Augmentations**: DINO [0]_ uses random cropping, resizing, color jittering, and Gaussian blur to create diverse views of the same image. In particular, DINO generates two global views and multiple local views that are smaller crops of the original image. - **Backbone**: Vision transformers, such as ViTs, and convolutional neural networks, such as ResNets, are employed to encode augmented images into feature representations. - **Projection Head**: A multilayer perceptron (MLP) maps features to a distribution over a set of learnable clusters. - **Distillation Loss**: The self-distillation loss encourages similarity between the student and teacher output distributions when processing independent views of the same image. Good to Know ------------ - **Backbone Networks**: DINO [0]_ works well with transformer and convolutional neural network architectures. - **Feature Quality**: DINO [0]_ learns particularly strong features without fine-tuning on downstream tasks. This is especially useful for clustering or k-Nearest Neighbors (kNN) classification. Reference: .. [0] `Emerging Properties in Self-Supervised Vision Transformers, 2021 `_ .. tabs:: .. tab:: PyTorch .. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c :target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/dino.ipynb This example can be run from the command line with:: python lightly/examples/pytorch/dino.py .. literalinclude:: ../../../examples/pytorch/dino.py .. tab:: Lightning .. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c :target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch_lightning/dino.ipynb This example can be run from the command line with:: python lightly/examples/pytorch_lightning/dino.py .. literalinclude:: ../../../examples/pytorch_lightning/dino.py .. tab:: Lightning Distributed .. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c :target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch_lightning_distributed/dino.ipynb This example runs on multiple gpus using Distributed Data Parallel (DDP) training with Pytorch Lightning. At least one GPU must be available on the system. The example can be run from the command line with:: python lightly/examples/pytorch_lightning_distributed/dino.py The model differs in the following ways from the non-distributed implementation: - Distributed Data Parallel is enabled - Synchronized Batch Norm is used in place of standard Batch Norm - Distributed Sampling is used in the dataloader Note that Synchronized Batch Norm is optional and the model can also be trained without it. Without Synchronized Batch Norm the batch norm for each GPU is only calculated based on the features on that specific GPU. Distributed Sampling makes sure that each distributed process sees only a subset of the data. .. literalinclude:: ../../../examples/pytorch_lightning_distributed/dino.py