lightly.models

The lightly.models package provides model implementations.

The package contains an implementation of the commonly used ResNet and adaptations of the architecture which make self-supervised learning simpler.

The package also hosts the Lightly model zoo - a list of downloadable ResNet checkpoints.

.resnet

Custom ResNet Implementation

Note that the architecture we present here differs from the one used in torchvision. We replace the first 7x7 convolution by a 3x3 convolution to make the model faster and run better on smaller input image resolutions.

Furthermore, we introduce a resnet-9 variant for extra small models. These can run for example on a microcontroller with 100kBytes of storage.

class lightly.models.resnet.BasicBlock(in_planes: int, planes: int, stride: int = 1, num_splits: int = 0)

Implementation of the ResNet Basic Block.

Attributes:
in_planes:

Number of input channels.

planes:

Number of channels.

stride:

Stride of the first convolutional.

forward(x: torch.Tensor)

Forward pass through basic ResNet block.

Args:
x:

Tensor of shape bsz x channels x W x H

Returns:

Tensor of shape bsz x channels x W x H

class lightly.models.resnet.Bottleneck(in_planes: int, planes: int, stride: int = 1, num_splits: int = 0)

Implementation of the ResNet Bottleneck Block.

Attributes:
in_planes:

Number of input channels.

planes:

Number of channels.

stride:

Stride of the first convolutional.

forward(x)

Forward pass through bottleneck ResNet block.

Args:
x:

Tensor of shape bsz x channels x W x H

Returns:

Tensor of shape bsz x channels x W x H

class lightly.models.resnet.ResNet(block: torch.nn.modules.module.Module = <class 'lightly.models.resnet.BasicBlock'>, layers: List[int] = [2, 2, 2, 2], num_classes: int = 10, width: float = 1.0, num_splits: int = 0)

ResNet implementation.

[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Deep Residual Learning for Image Recognition. arXiv:1512.03385

Attributes:
block:

ResNet building block type.

layers:

List of blocks per layer.

num_classes:

Number of classes in final softmax layer.

width:

Multiplier for ResNet width.

forward(x: torch.Tensor)

Forward pass through ResNet.

Args:
x:

Tensor of shape bsz x channels x W x H

Returns:

Output tensor of shape bsz x num_classes

lightly.models.resnet.ResNetGenerator(name: str = 'resnet-18', width: float = 1, num_classes: int = 10, num_splits: int = 0)

Builds and returns the specified ResNet.

Args:
name:

ResNet version from resnet-{9, 18, 34, 50, 101, 152}.

width:

ResNet width.

num_classes:

Output dim of the last layer.

num_splits:

Number of splits to use for SplitBatchNorm (for MoCo model). Increase this number to simulate multi-gpu behavior. E.g. num_splits=8 simulates a 8-GPU cluster. num_splits=0 uses normal PyTorch BatchNorm.

Returns:

ResNet as nn.Module.

Examples:
>>> # binary classifier with ResNet-34
>>> from lightly.models import ResNetGenerator
>>> resnet = ResNetGenerator('resnet-34', num_classes=2)

.barlowtwins

Barlow Twins resnet-based Model [0] [0] Zbontar,J. et.al. 2021. Barlow Twins… https://arxiv.org/abs/2103.03230

class lightly.models.barlowtwins.BarlowTwins(backbone: torch.nn.modules.module.Module, num_ftrs: int = 2048, proj_hidden_dim: int = 8192, out_dim: int = 8192, num_mlp_layers: int = 3)

Implementation of BarlowTwins[0] network.

Recommended loss: lightly.loss.barlow_twins_loss.BarlowTwinsLoss

Default params are the ones explained in the original paper [0]. [0] Zbontar,J. et.al. 2021. Barlow Twins… https://arxiv.org/abs/2103.03230

Attributes:
backbone:

Backbone model to extract features from images. ResNet-50 in original paper [0].

num_ftrs:

Dimension of the embedding (before the projection head).

proj_hidden_dim:

Dimension of the hidden layer of the projection head. This should be the same size as num_ftrs.

out_dim:

Dimension of the output (after the projection head).

forward(x0: torch.Tensor, x1: torch.Tensor = None, return_features: bool = False)

Forward pass through BarlowTwins.

Extracts features with the backbone and applies the projection head to the output space. If both x0 and x1 are not None, both will be passed through the backbone and projection. If x1 is None, only x0 will be forwarded. Barlow Twins only implement a projection head unlike SimSiam.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

return_features:

Whether or not to return the intermediate features backbone(x).

Returns:

The output projection of x0 and (if x1 is not None) the output projection of x1. If return_features is True, the output for each x is a tuple (out, f) where f are the features before the projection head.

Examples:
>>> # single input, single output
>>> out = model(x)
>>>
>>> # single input with return_features=True
>>> out, f = model(x, return_features=True)
>>>
>>> # two inputs, two outputs
>>> out0, out1 = model(x0, x1)
>>>
>>> # two inputs, two outputs with return_features=True
>>> (out0, f0), (out1, f1) = model(x0, x1, return_features=True)

.simclr

SimCLR Model

class lightly.models.simclr.SimCLR(backbone: torch.nn.modules.module.Module, num_ftrs: int = 32, out_dim: int = 128)

Implementation of the SimCLR[0] architecture

Recommended loss: lightly.loss.ntx_ent_loss.NTXentLoss

[0] SimCLR, 2020, https://arxiv.org/abs/2002.05709

Attributes:
backbone:

Backbone model to extract features from images.

num_ftrs:

Dimension of the embedding (before the projection head).

out_dim:

Dimension of the output (after the projection head).

forward(x0: torch.Tensor, x1: torch.Tensor = None, return_features: bool = False)

Embeds and projects the input images.

Extracts features with the backbone and applies the projection head to the output space. If both x0 and x1 are not None, both will be passed through the backbone and projection head. If x1 is None, only x0 will be forwarded.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

return_features:

Whether or not to return the intermediate features backbone(x).

Returns:

The output projection of x0 and (if x1 is not None) the output projection of x1. If return_features is True, the output for each x is a tuple (out, f) where f are the features before the projection head.

Examples:
>>> # single input, single output
>>> out = model(x) 
>>> 
>>> # single input with return_features=True
>>> out, f = model(x, return_features=True)
>>>
>>> # two inputs, two outputs
>>> out0, out1 = model(x0, x1)
>>>
>>> # two inputs, two outputs with return_features=True
>>> (out0, f0), (out1, f1) = model(x0, x1, return_features=True)

.moco

MoCo Model

class lightly.models.moco.MoCo(backbone: torch.nn.modules.module.Module, num_ftrs: int = 32, out_dim: int = 128, m: float = 0.999, batch_shuffle: bool = False)

Implementation of the MoCo (Momentum Contrast)[0] architecture.

Recommended loss: lightly.loss.ntx_ent_loss.NTXentLoss with a memory bank.

[0] MoCo, 2020, https://arxiv.org/abs/1911.05722

Attributes:
backbone:

Backbone model to extract features from images.

num_ftrs:

Dimension of the embedding (before the projection head).

out_dim:

Dimension of the output (after the projection head).

m:

Momentum for momentum update of the key-encoder.

forward(x0: torch.Tensor, x1: torch.Tensor = None, return_features: bool = False)

Embeds and projects the input image.

Performs the momentum update, extracts features with the backbone and applies the projection head to the output space. If both x0 and x1 are not None, both will be passed through the backbone and projection head. If x1 is None, only x0 will be forwarded.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

return_features:

Whether or not to return the intermediate features backbone(x).

Returns:

The output projection of x0 and (if x1 is not None) the output projection of x1. If return_features is True, the output for each x is a tuple (out, f) where f are the features before the projection head.

Examples:
>>> # single input, single output
>>> out = model(x) 
>>> 
>>> # single input with return_features=True
>>> out, f = model(x, return_features=True)
>>>
>>> # two inputs, two outputs
>>> out0, out1 = model(x0, x1)
>>>
>>> # two inputs, two outputs with return_features=True
>>> (out0, f0), (out1, f1) = model(x0, x1, return_features=True)

.simsiam

SimSiam Model

class lightly.models.simsiam.SimSiam(backbone: torch.nn.modules.module.Module, num_ftrs: int = 2048, proj_hidden_dim: int = 2048, pred_hidden_dim: int = 512, out_dim: int = 2048, num_mlp_layers: int = 3)

Implementation of SimSiam[0] network

Recommended loss: lightly.loss.sym_neg_cos_sim_loss.SymNegCosineSimilarityLoss

[0] SimSiam, 2020, https://arxiv.org/abs/2011.10566

Attributes:
backbone:

Backbone model to extract features from images.

num_ftrs:

Dimension of the embedding (before the projection head).

proj_hidden_dim:

Dimension of the hidden layer of the projection head. This should be the same size as num_ftrs.

pred_hidden_dim:

Dimension of the hidden layer of the predicion head. This should be num_ftrs / 4.

out_dim:

Dimension of the output (after the projection head).

forward(x0: torch.Tensor, x1: torch.Tensor = None, return_features: bool = False)

Forward pass through SimSiam.

Extracts features with the backbone and applies the projection head and prediction head to the output space. If both x0 and x1 are not None, both will be passed through the backbone, projection, and prediction head. If x1 is None, only x0 will be forwarded.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

return_features:

Whether or not to return the intermediate features backbone(x).

Returns:

The output prediction and projection of x0 and (if x1 is not None) the output prediction and projection of x1. If return_features is True, the output for each x is a tuple (out, f) where f are the features before the projection head.

Examples:
>>> # single input, single output
>>> out = model(x) 
>>> 
>>> # single input with return_features=True
>>> out, f = model(x, return_features=True)
>>>
>>> # two inputs, two outputs
>>> out0, out1 = model(x0, x1)
>>>
>>> # two inputs, two outputs with return_features=True
>>> (out0, f0), (out1, f1) = model(x0, x1, return_features=True)

.byol

BYOL Model

class lightly.models.byol.BYOL(backbone: torch.nn.modules.module.Module, num_ftrs: int = 2048, hidden_dim: int = 4096, out_dim: int = 256, m: float = 0.9)

Implementation of the BYOL architecture.

Attributes:
backbone:

Backbone model to extract features from images.

num_ftrs:

Dimension of the embedding (before the projection mlp).

hidden_dim:

Dimension of the hidden layer in the projection and prediction mlp.

out_dim:

Dimension of the output (after the projection/prediction mlp).

m:

Momentum for the momentum update of encoder.

forward(x0: torch.Tensor, x1: torch.Tensor, return_features: bool = False)

Symmetrizes the forward pass (see _forward).

Performs two forward passes, once where x0 is passed through the encoder and x1 through the momentum encoder and once the other way around.

Note that this model currently requires two inputs for the forward pass (x0 and x1) which correspond to the two augmentations. Furthermore, the return_features argument does not work yet.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

Returns:

A tuple out0, out1, where out0 and out1 are tuples containing the predictions and projections of x0 and x1: out0 = (z0, p0) and out1 = (z1, p1).

Examples:
>>> # initialize the model and the loss function
>>> model = BYOL()
>>> criterion = SymNegCosineSimilarityLoss()
>>>
>>> # forward pass for two batches of transformed images x1 and x2
>>> out0, out1 = model(x0, x1)
>>> loss = criterion(out0, out1)

.nnclr

NNCLR Model

class lightly.models.nnclr.NNCLR(backbone: torch.nn.modules.module.Module, num_ftrs: int = 512, proj_hidden_dim: int = 2048, pred_hidden_dim: int = 4096, out_dim: int = 256, num_mlp_layers: int = 3)

Implementation of the NNCLR[0] architecture

Recommended loss: lightly.loss.ntx_ent_loss.NTXentLoss Recommended module: lightly.models.modules.nn_memory_bank.NNmemoryBankModule

[0] NNCLR, 2021, https://arxiv.org/abs/2104.14548

Attributes:
backbone:

Backbone model to extract features from images.

num_ftrs:

Dimension of the embedding (before the projection head).

proj_hidden_dim:

Dimension of the hidden layer of the projection head.

pred_hidden_dim:

Dimension of the hidden layer of the predicion head.

out_dim:

Dimension of the output (after the projection head).

num_mlp_layers:

Number of linear layers for MLP.

Examples:
>>> model = NNCLR(backbone)
>>> criterion = NTXentLoss(temperature=0.1)
>>> 
>>> nn_replacer = NNmemoryBankModule(size=2 ** 16)
>>>
>>> # forward pass
>>> (z0, p0), (z1, p1) = model(x0, x1)
>>> z0 = nn_replacer(z0.detach(), update=False)
>>> z1 = nn_replacer(z1.detach(), update=True)
>>>
>>> loss = 0.5 * (criterion(z0, p1) + criterion(z1, p0))
forward(x0: torch.Tensor, x1: torch.Tensor = None, return_features: bool = False)

Embeds and projects the input images.

Extracts features with the backbone and applies the projection head to the output space. If both x0 and x1 are not None, both will be passed through the backbone and projection head. If x1 is None, only x0 will be forwarded.

Args:
x0:

Tensor of shape bsz x channels x W x H.

x1:

Tensor of shape bsz x channels x W x H.

return_features:

Whether or not to return the intermediate features backbone(x).

Returns:

The output projection of x0 and (if x1 is not None) the output projection of x1. If return_features is True, the output for each x is a tuple (out, f) where f are the features before the projection head.

Examples:
>>> # single input, single output
>>> out = model(x) 
>>> 
>>> # single input with return_features=True
>>> out, f = model(x, return_features=True)
>>>
>>> # two inputs, two outputs
>>> out0, out1 = model(x0, x1)
>>>
>>> # two inputs, two outputs with return_features=True
>>> (out0, f0), (out1, f1) = model(x0, x1, return_features=True)

.zoo

Lightly Model Zoo

lightly.models.zoo.checkpoints()

Returns the Lightly model zoo as a list of checkpoints.

Checkpoints:
ResNet-9:

SimCLR with width = 0.0625 and num_ftrs = 16

ResNet-9:

SimCLR with width = 0.125 and num_ftrs = 16

ResNet-18:

SimCLR with width = 1.0 and num_ftrs = 16

ResNet-18:

SimCLR with width = 1.0 and num_ftrs = 32

ResNet-34:

SimCLR with width = 1.0 and num_ftrs = 16

ResNet-34:

SimCLR with width = 1.0 and num_ftrs = 32

Returns:

A list of available checkpoints as URLs.

The lightly.models.modules package provides reusable modules.

This package contains reusable modules such as the NNmemoryBankModule which can be combined with any lightly model.

.nn_memory_bank

Nearest Neighbour Memory Bank Module

class lightly.models.modules.nn_memory_bank.NNMemoryBankModule(size: int = 65536)

Nearest Neighbour Memory Bank implementation

This class implements a nearest neighbour memory bank as described in the NNCLR paper[0]. During the forward pass we return the nearest neighbour from the memory bank.

[0] NNCLR, 2021, https://arxiv.org/abs/2104.14548

Attributes:
size:

Number of keys the memory bank can store. If set to 0, memory bank is not used.

Examples:
>>> model = NNCLR(backbone)
>>> criterion = NTXentLoss(temperature=0.1)
>>> 
>>> nn_replacer = NNmemoryBankModule(size=2 ** 16)
>>>
>>> # forward pass
>>> (z0, p0), (z1, p1) = model(x0, x1)
>>> z0 = nn_replacer(z0.detach(), update=False)
>>> z1 = nn_replacer(z1.detach(), update=True)
>>>
>>> loss = 0.5 * (criterion(z0, p1) + criterion(z1, p0))
forward(output: torch.Tensor, update: bool = False)

Returns nearest neighbour of output tensor from memory bank

Args:

output: The torch tensor for which you want the nearest neighbour update: If True updated the memory bank by adding output to it