lightly.loss

The lightly.loss package provides loss functions for self-supervised learning.

.ntx_ent_loss

class lightly.loss.ntx_ent_loss.NTXentLoss(temperature: float = 0.5, memory_bank_size: int = 0)

Implementation of the Contrastive Cross Entropy Loss.

This implementation follows the SimCLR[0] paper. If you enable the memory bank by setting the memory_bank_size value > 0 the loss behaves like the one described in the MoCo[1] paper.

[0] SimCLR, 2020, https://arxiv.org/abs/2002.05709 [1] MoCo, 2020, https://arxiv.org/abs/1911.05722

Attributes:
temperature:

Scale logits by the inverse of the temperature.

memory_bank_size:

Number of negative samples to store in the memory bank. Use 0 for SimCLR. For MoCo we typically use numbers like 4096 or 65536.

Raises:

ValueError if abs(temperature) < 1e-8 to prevent divide by zero.

Examples:

>>> # initialize loss function without memory bank
>>> loss_fn = NTXentLoss(memory_bank_size=0)
>>>
>>> # generate two random transforms of images
>>> t0 = transforms(images)
>>> t1 = transforms(images)
>>>
>>> # feed through SimCLR or MoCo model
>>> batch = torch.cat((t0, t1), dim=0)
>>> output = model(batch)
>>>
>>> # calculate loss
>>> loss = loss_fn(output)
forward(out0: torch.Tensor, out1: torch.Tensor)

Forward pass through Contrastive Cross-Entropy Loss.

If used with a memory bank, the samples from the memory bank are used as negative examples. Otherwise, within-batch samples are used as negative samples.

Args:
out0:

Output projections of the first set of transformed images. Shape: (batch_size, embedding_size)

out1:

Output projections of the second set of transformed images. Shape: (batch_size, embedding_size)

Returns:

Contrastive Cross Entropy Loss value.

.sym_neg_cos_sim_loss

class lightly.loss.sym_neg_cos_sim_loss.SymNegCosineSimilarityLoss

Implementation of the Symmetrized Loss used in the SimSiam[0] paper.

[0] SimSiam, 2020, https://arxiv.org/abs/2011.10566

Examples:

>>> # initialize loss function
>>> loss_fn = SymNegCosineSimilarityLoss()
>>>
>>> # generate two random transforms of images
>>> t0 = transforms(images)
>>> t1 = transforms(images)
>>>
>>> # feed through SimSiam model
>>> out0, out1 = model(t0, t1)
>>>
>>> # calculate loss
>>> loss = loss_fn(out0, out1)
forward(out0: torch.Tensor, out1: torch.Tensor)

Forward pass through Symmetric Loss.

Args:
out0:

Output projections of the first set of transformed images. Expects the tuple to be of the form (z0, p0), where z0 is the output of the backbone and projection mlp, and p0 is the output of the prediction head.

out1:

Output projections of the second set of transformed images. Expects the tuple to be of the form (z1, p1), where z1 is the output of the backbone and projection mlp, and p1 is the output of the prediction head.

Returns:

Contrastive Cross Entropy Loss value.

Raises:

ValueError if shape of output is not multiple of batch_size.

.memory_bank

class lightly.loss.memory_bank.MemoryBankModule(size: int = 65536)

Memory bank implementation

This is a parent class to all loss functions implemented by the lightly Python package. This way, any loss can be used with a memory bank if desired.

Attributes:
size:

Number of keys the memory bank can store. If set to 0, memory bank is not used.

Examples:
>>> class MyLossFunction(MemoryBankModule):
>>>
>>>     def __init__(self, memory_bank_size: int = 2 ** 16):
>>>         super(MyLossFunction, self).__init__(memory_bank_size)
>>>
>>>     def forward(self, output: torch.Tensor,
>>>                 labels: torch.Tensor = None):
>>>
>>>         output, negatives = super(
>>>             MyLossFunction, self).forward(output)
>>>
>>>         if negatives is not None:
>>>             # evaluate loss with negative samples
>>>         else:
>>>             # evaluate loss without negative samples
forward(output: torch.Tensor, labels: torch.Tensor = None, update: bool = False)

Query memory bank for additional negative samples

Args:
output:

The output of the model.

labels:

Should always be None, will be ignored.

Returns:

The output if the memory bank is of size 0, otherwise the output and the entries from the memory bank.

.barlow_twins_loss

class lightly.loss.barlow_twins_loss.BarlowTwinsLoss(lambda_param=0.005)

Implementation of the Barlow Twins Loss from Barlow Twins[0] paper. This code specifically implements the Figure Algorithm 1 from [0].

[0] Zbontar,J. et.al, 2021, Barlow Twins… https://arxiv.org/abs/2103.03230

Examples:

>>> # initialize loss function
>>> loss_fn = BarlowTwinsLoss()
>>>
>>> # generate two random transforms of images
>>> t0 = transforms(images)
>>> t1 = transforms(images)
>>>
>>> # feed through SimSiam model
>>> out0, out1 = model(t0, t1)
>>>
>>> # calculate loss
>>> loss = loss_fn(out0, out1)
forward(z_a: torch.Tensor, z_b: torch.Tensor)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

.hypersphere_loss

class lightly.loss.hypersphere_loss.HypersphereLoss(t=1.0, lam=1.0, alpha=2.0)

Implementation of the loss described in ‘Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere.’ [0]

[0] Tongzhou Wang. et.al, 2020, … https://arxiv.org/abs/2005.10242

In order for this loss to function as advertized, an l1-normalization to the hypersphere is required. This loss function applies this l1-normalization internally in the loss-layer. However, it is recommended that the same normalization is also applied in your architecture, considering that this l1-loss is also intended to be applied during inference. Perhaps there may be merit in leaving it out of the inferrence pathway, but this use has not been tested.

Moreover it is recommended that the layers preceeding this loss function are either a linear layer without activation, a batch-normalization layer, or both. The directly upstream architecture can have a large influence on the ability of this loss to achieve its stated aim of promoting uniformity on the hypersphere; and if by contrast the last layer going into the embedding is a RELU or similar nonlinearity, we may see that we will never get very close to achieving the goal of uniformity on the hypersphere, but will confine ourselves to the subspace of positive activations. Similar architectural considerations are relevant to most contrastive loss functions, but we call it out here explicitly.

Examples:

>>> # initialize loss function
>>> loss_fn = HypersphereLoss()
>>>
>>> # generate two random transforms of images
>>> t0 = transforms(images)
>>> t1 = transforms(images)
>>>
>>> # feed through SimSiam model
>>> out0, out1 = model(t0, t1)
>>>
>>> # calculate loss
>>> loss = loss_fn(out0, out1)
forward(z_a: torch.Tensor, z_b: torch.Tensor) → torch.Tensor
Args:

x : torch.Tensor, [b, d], float y : torch.Tensor, [b, d], float

Returns:
torch.Tensor, [], float

scalar loss value

.regularizer.co2

class lightly.loss.regularizer.co2.CO2Regularizer(alpha: float = 1, t_consistency: float = 0.05, memory_bank_size: int = 0)

Implementation of the CO2 regularizer [0] for self-supervised learning.

[0] CO2, 2021, https://arxiv.org/abs/2010.02217

Attributes:
alpha:

Weight of the regularization term.

t_consistency:

Temperature used during softmax calculations.

memory_bank_size:

Number of negative samples to store in the memory bank. Use 0 to use the second batch for negative samples.

Examples:
>>> # initialize loss function for MoCo
>>> loss_fn = NTXentLoss(memory_bank_size=4096)
>>>
>>> # initialize CO2 regularizer
>>> co2 = CO2Regularizer(alpha=1.0, memory_bank_size=4096)
>>>
>>> # generate two random trasnforms of images
>>> t0 = transforms(images)
>>> t1 = transforms(images)
>>>
>>> # feed through the MoCo model
>>> out0, out1 = model(t0, t1)
>>> 
>>> # calculate loss and apply regularizer
>>> loss = loss_fn(out0, out1) + co2(out0, out1)
forward(out0: torch.Tensor, out1: torch.Tensor)

Computes the CO2 regularization term for two model outputs.

Args:
out0:

Output projections of the first set of transformed images.

out1:

Output projections of the second set of transformed images.

Returns:

The regularization term multiplied by the weight factor alpha.