Main concepts

Self-supervised Learning

The figure below shows an overview of the different concepts used by the ligthly PIP package and a schema of how they interact. The expressions in bold are explained further below.

Lightly Overview

Overview of the different concepts used by the lightly PIP package and how they interact.

  • Dataset

    In lightly, datasets are accessed through You can create a LightlyDataset from a folder of images, videos, or simply from a torchvision dataset. You can learn more about this here:

  • Collate Function

    The collate function is the place where lightly applies augmentations which are crucial for self-supervised learning. You can use our pre-defined augmentations or write your own ones. For more information, check out Advanced Concepts in Self-Supervised Learning and You can add your own augmentations very easily as we show in this tutorial:

  • Dataloader

    For the dataloader you can simply use the PyTorch dataloader. Be sure to pass it a LightlyDataset though!

  • Backbone Neural Network

    One of the cool things about self-supervised learning is that you can pre-train your neural networks without the need for annotated data. You can plugin whatever backbone you want! If you don’t know where to start, our tutorials show how you can get a backbone neural network from a lightly.models.resnet.ResNet.

  • Model

    The model combines your backbone neural network with one or multiple heads and, if required, a momentum encoder to provide an easy-to-use interface to the most popular self-supervised learning frameworks. Learn more in our tutorials:

  • Loss

    The loss function plays a crucial role in self-supervised learning. Currently, lightly supports contrastive and similarity based loss functions.

  • Optimizer

    With lightly, you can use any PyTorch optimizer to train your model.

  • Training

    The model can either be trained using a plain PyTorch training loop or with a dedicated framework such as PyTorch Lightning. Lightly lets you choose what is best for you. Check out our tutorials section for examples.

  • Image Embeddings

    During the training process, the model learns to create compact embeddings from images. The embeddings, also often called representations, can then be used for tasks such as identifying similar images or creating a diverse subset from your data:

  • Pre-trained Backbone

    The backbone can be reused after self-supervised training. We can transfer it to any other task that requires a similar network architecture, including image classification, object detection and segmentation tasks. You can learn more in our object detection tutorial: