.. _lightly-at-a-glance:

Self-supervised learning
========================

Lightly\ **SSL** is a computer vision framework for training deep learning models using self-supervised learning.
The framework can be used for a wide range of useful applications such as finding the nearest 
neighbors, similarity search, transfer learning, or data analytics.


How LightlySSL Works
---------------------
The flexible design of Lightly\ **SSL** makes it easy to integrate in your Python code. Lightly\ **SSL** is built
completely around PyTorch and the different pieces can be put together to fit *your* requirements.

Data and Transformations
^^^^^^^^^^^^^^^^^^^^^^^^
The basic building block of contrastive self-supervised methods
such as `SimCLR <https://arxiv.org/abs/2002.05709>`_ are image transformations. Each image is transformed into
several new images by randomly applied augmentations. The task of the self-supervised model is then to identify the
images which come from the same original among a set of negative examples.

For example, the transforms below will apply the SimCLR image transform to the input images.

.. code-block:: python

    from lightly.transforms.simclr_transform import SimCLRTransform

    # The following transform will return two augmented images per input image.
    transform = SimCLRTransform()

Let's now load an image dataset and create a PyTorch dataloader.

.. code-block:: python

    import torch
    import lightly.data as data

    # Create a dataset from your image folder.
    dataset = data.LightlyDataset(
        input_dir='./my/cute/cats/dataset/',
        transform=transform,
    )

    # Build a PyTorch dataloader.
    dataloader = torch.utils.data.DataLoader(
        dataset,                # Pass the dataset to the dataloader.
        batch_size=128,         # A large batch size helps with learning.
        shuffle=True,           # Shuffling is important!
    )

.. note:: You can also use a custom PyTorch `Dataset` instead of the 
          `LightlyDataset`. Just make sure your `Dataset` implementation returns
          a tuple of **(sample, target, filename)** to support the basic functions
          for training models. See :py:class:`lightly.data.dataset`
          for more information.


Head to the next section to see how you can train a ResNet on the data you just prepared.

Model, Loss and Training
^^^^^^^^^^^^^^^^^^^^^^^^

Now, we need an embedding model, an optimizer and a loss function. We use a ResNet together
with the normalized temperature-scaled cross entropy loss and simple stochastic gradient descent.

.. code-block:: python

    import torchvision

    from lightly.loss import NTXentLoss
    from lightly.models.modules.heads import SimCLRProjectionHead

    # use a resnet backbone
    resnet = torchvision.models.resnet18()
    resnet = torch.nn.Sequential(*list(resnet.children())[:-1])

    # build a SimCLR model
    class SimCLR(torch.nn.Module):
        def __init__(self, backbone, hidden_dim, out_dim):
            super().__init__()
            self.backbone = backbone
            self.projection_head = SimCLRProjectionHead(hidden_dim, hidden_dim, out_dim)

        def forward(self, x):
            h = self.backbone(x).flatten(start_dim=1)
            z = self.projection_head(h)
            return z

    model = SimCLR(resnet, hidden_dim=512, out_dim=128)

    # use a criterion for self-supervised learning
    # (normalized temperature-scaled cross entropy loss)
    criterion = NTXentLoss(temperature=0.5)

    # get a PyTorch optimizer
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-0, weight_decay=1e-5)


.. note:: You can also use custom backbones and use lightly to train them using
          self-supervised learning. Learn more about how to use custom backbones
          in our 
          `colab playground <https://colab.research.google.com/drive/1ubepXnpANiWOSmq80e-mqAxjLx53m-zu?usp=sharing>`_.


Train the model for 10 epochs.

.. code-block:: python

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    max_epochs = 10
    for epoch in range(max_epochs):
        for (x0, x1), _, _ in dataloader:

            x0 = x0.to(device)
            x1 = x1.to(device)

            z0 = model(x0)
            z1 = model(x1)

            loss = criterion(z0, z1)
            loss.backward()

            optimizer.step()
            optimizer.zero_grad()


Congrats, you just trained your first model using self-supervised learning!

You can of course also use `PyTorch Lightning <https://www.pytorchlightning.ai/>`_ to implement and train your model.

.. code-block:: python

    import pytorch_lightning as pl

    class SimCLR(pl.LightningModule):
        def __init__(self, backbone, hidden_dim, out_dim):
            super().__init__()
            self.backbone = backbone
            self.projection_head = SimCLRProjectionHead(hidden_dim, hidden_dim, out_dim)
            self.criterion = NTXentLoss(temperature=0.5)

        def forward(self, x):
            h = self.backbone(x).flatten(start_dim=1)
            z = self.projection_head(h)
            return z

        def training_step(self, batch, batch_idx):
            (x0, x1), _, _ = batch
            z0 = self.forward(x0)
            z1 = self.forward(x1)
            loss = self.criterion(z0, z1)
            return loss

        def configure_optimizers(self):
            optimizer = torch.optim.SGD(self.parameters(), lr=1e-0)
            return optimizer

    model = SimCLR(resnet, hidden_dim=512, out_dim=128)
    trainer = pl.Trainer(max_epochs=max_epochs, devices=1, accelerator="gpu")
    trainer.fit(
        model,
        dataloader
    )

To train on a machine with multiple GPUs we recommend using the 
`distributed data parallel` strategy.

.. code-block:: python

    # If we have a machine with 4 GPUs we set devices=4 and accelerator="gpu".
    trainer = pl.Trainer(
        max_epochs=max_epochs, 
        devices=4,
        accelerator="gpu",
        strategy='ddp'
    )
    trainer.fit(
        model,
        dataloader
    )

Embeddings
^^^^^^^^^^
You can use the trained model to embed your images or even access the embedding
model directly.

.. code-block:: python 

    # make a new dataloader without the transformations
    # The only transformation needed is to make a torch tensor out of the PIL image
    dataset.transform = torchvision.transforms.ToTensor()
    dataloader = torch.utils.data.DataLoader(
        dataset,        # use the same dataset as before
        batch_size=1,   # we can use batch size 1 for inference
        shuffle=False,  # don't shuffle your data during inference
    )

    # embed your image dataset
    embeddings = []
    model.eval()
    with torch.no_grad():
        for img, label, fnames in dataloader:
            img = img.to(model.device)
            emb = model.backbone(img).flatten(start_dim=1)
            embeddings.append(emb)

        embeddings = torch.cat(embeddings, 0)

Done! You can continue to use the embeddings to find nearest neighbors or do similarity search.
Furthermore, the ResNet backbone can be used for transfer and few-shot learning.

.. code-block:: python

    # access the ResNet backbone
    resnet = model.backbone

.. note::

    Self-supervised learning does not require labels for a model to be trained on. Lightly,
    however, supports the use of additional labels. For example, if you train a model
    on a folder 'cats' with subfolders 'Maine Coon', 'Bengal' and 'British Shorthair'
    Lightly\ **SSL** automatically returns the enumerated labels as a list.


What's Next?
------------
Get started by :ref:`rst-installing` and follow through the tutorials to 
learn how to get the most out of using Lightly:

Tutorials:

- :ref:`input-structure-label`
- :ref:`lightly-moco-tutorial-2`
- :ref:`lightly-simclr-tutorial-3`  
- :ref:`lightly-simsiam-tutorial-4`  
- :ref:`lightly-custom-augmentation-5`