lightly.models.utils

Utils for working with SSL models

lightly.models.utils.activate_requires_grad(model: Module)

Activates the requires_grad flag for all parameters of a model.

Use this method to activate gradients for a model (e.g. after deactivating them using deactivate_requires_grad(…)).

Examples

>>> backbone = resnet18()
>>> activate_requires_grad(backbone)

lightly.models.utils.add_stochastic_depth_to_blocks(vit: Module, prob: float = 0.0, mode='row') → None

Adds stochastic depth dropout to all transformer blocks in a Vision Transformer Model

Parameters

vit – Vision Transformer Model to which stochastic depth dropout will be added.
prob – Probability of dropping a layer.
mode – Mode for stochastic depth. Default is “row”.

Raises

Runtime Error – If torchvision version is less than 0.12.

lightly.models.utils.apply_masks(x: Tensor, masks: torch.Tensor | list[torch.Tensor]) → Tensor

Apply masks to the input tensor.

From https://github.com/facebookresearch/ijepa/blob/main/src/masks/utils.py

Parameters

x – Tensor of shape (B, N, D) where N is the number of patches.
masks – Tensor or list of tensors containing indices of patches in [0, N-1] to keep. Each tensor musth have shape (B, K) where K is the number of patches to keep. All masks must have the same K.

Returns

Tensor of shape (B * num_masks, K, D) where K is the number of patches to keep.

lightly.models.utils.batch_shuffle(batch: Tensor, distributed: bool = False) → Tuple[Tensor, Tensor]

Randomly shuffles all tensors in the batch.

Parameters

batch – The batch to shuffle.
distributed – If True then batches are shuffled across multiple gpus.

Returns

A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.

Examples

>>> # forward pass through the momentum model with batch shuffling
>>> x1_shuffled, shuffle = batch_shuffle(x1)
>>> f1 = moco_momentum(x1)
>>> out0 = projection_head_momentum(f0)
>>> out1 = batch_unshuffle(out1, shuffle)

lightly.models.utils.batch_shuffle_distributed(batch: Tensor) → Tuple[Tensor, Tensor]

Shuffles batch over multiple devices.

This code was taken and adapted from here: https://github.com/facebookresearch/moco.

Parameters: batch – The tensor to shuffle.
Returns: A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.

lightly.models.utils.batch_unshuffle(batch: Tensor, shuffle: Tensor, distributed: bool = False) → Tensor

Unshuffles a batch.

Parameters

batch – The batch to unshuffle.
shuffle – Index to unshuffle the batch.
distributed – If True then the batch is unshuffled across multiple gpus.

Returns

The unshuffled batch.

Examples

>>> # forward pass through the momentum model with batch shuffling
>>> x1_shuffled, shuffle = batch_shuffle(x1)
>>> f1 = moco_momentum(x1)
>>> out0 = projection_head_momentum(f0)
>>> out1 = batch_unshuffle(out1, shuffle)

lightly.models.utils.batch_unshuffle_distributed(batch: Tensor, shuffle: Tensor) → Tensor

Undo batch shuffle over multiple devices.

This code was taken and adapted from here: https://github.com/facebookresearch/moco.

Parameters

batch – The tensor to unshuffle.
shuffle – Index to restore the original tensor.

Returns

The unshuffled tensor.

lightly.models.utils.concat_all_gather(x: Tensor) → Tensor

Returns concatenated instances of x gathered from all gpus.

This code was taken and adapted from here: https://github.com/facebookresearch/moco.

lightly.models.utils.deactivate_requires_grad(model: Module)

Deactivates the requires_grad flag for all parameters of a model.

This has the same effect as permanently executing the model within a torch.no_grad() context. Use this method to disable gradient computation and therefore training for a model.

Examples

>>> backbone = resnet18()
>>> deactivate_requires_grad(backbone)

lightly.models.utils.expand_index_like(index: Tensor, tokens: Tensor) → Tensor

Expands the index along the last dimension of the input tokens.

Parameters

index – Index tensor with shape (batch_size, idx_length) where each entry is an index in [0, sequence_length).
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).

Returns

Index tensor with shape (batch_size, idx_length, dim) where the original indices are repeated dim times along the last dimension.

lightly.models.utils.get_1d_sine_cosine_positional_embedding_from_positions(embed_dim: int, pos: NDArray[np.float32]) → NDArray[np.float32]

Generates 1D sine-cosine positional embedding from positions.

Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py

Parameters

embed_dim – Embedding dimension.
pos – Positions to be encoded with shape (N, M).

Returns

Positional embedding with shape (N * M, embed_dim).

lightly.models.utils.get_2d_sincos_pos_embed(embed_dim: int, grid_size: int, cls_token: bool) → NDArray[np.float32]

Generates 2D sine-cosine positional embedding.

Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py

Parameters

embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.

Returns

Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.

lightly.models.utils.get_2d_sine_cosine_positional_embedding(embed_dim: int, grid_size: int, cls_token: bool) → NDArray[np.float32]

Generates 2D sine-cosine positional embedding.

Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py

Parameters

embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.

Returns

Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.

lightly.models.utils.get_2d_sine_cosine_positional_embedding_from_grid(embed_dim: int, grid: NDArray[np.float32]) → NDArray[np.float32]

Generates 2D sine-cosine positional embedding from a grid.

Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py

Parameters

embed_dim – Embedding dimension.
grid – Grid of shape (2, grid_size, grid_size) with x and y coordinates.

Returns

Positional embedding with shape (grid_size * grid_size, embed_dim).

lightly.models.utils.get_at_index(tokens: Tensor, index: Tensor) → Tensor

Selects tokens at index.

Parameters

tokens – Token tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length) where each entry is an index in [0, sequence_length).

Returns

Token tensor with shape (batch_size, index_length, dim) containing the selected tokens.

lightly.models.utils.get_named_leaf_modules(module: Module) → Dict[str, Module]: Returns all leaf modules of the model with their names.

lightly.models.utils.get_weight_decay_parameters(modules: ~typing.Iterable[~torch.nn.modules.module.Module], decay_norm: bool = False, decay_bias: bool = False, norm_layers: ~typing.Tuple[~typing.Type[~torch.nn.modules.module.Module], ...] = (<class 'torch.nn.modules.batchnorm._NormBase'>, <class 'torch.nn.modules.normalization.LayerNorm'>, <class 'torch.nn.modules.normalization.CrossMapLRN2d'>, <class 'torch.nn.modules.normalization.LocalResponseNorm'>, <class 'torch.nn.modules.normalization.GroupNorm'>)) → Tuple[List[Parameter], List[Parameter]]

Returns all parameters of the modules that should be decayed and not decayed.

Parameters

modules – List of modules to get the parameters from.
decay_norm – If True, normalization parameters are decayed.
decay_bias – If True, bias parameters are decayed.
norm_layers – Tuple of normalization classes to decay if decay_norm is True.

Returns

(params, params_no_weight_decay) tuple.

lightly.models.utils.initialize_learnable_positional_embedding(pos_embedding: Parameter) → None

Initializes a learnable positional embedding.

Uses standard initialization for ViT models, see [0].

[0]: https://github.com/huggingface/pytorch-image-models/blob/cec70b6779ea81cec0ca08ee4a257b52affd235a/timm/models/vision_transformer.py#L590

Parameters: pos_embedding – Positional embedding parameter.

lightly.models.utils.initialize_positional_embedding(pos_embedding: Parameter, strategy: str, num_prefix_tokens: int) → None

Initializes the positional embedding with the given strategy.

Parameters

pos_embedding – Positional embedding parameter.
strategy – Positional embedding initialization strategy. Valid options are: [‘learn’, ‘sincos’, ‘skip’]. ‘learn’ makes the embedding learnable, ‘sincos’ creates a fixed 2D sine-cosine positional embedding, and ‘skip’ does not initialize the positional embedding.
num_prefix_tokens – Number of prefix tokens in the positional embedding. This includes the class token.

Raises

ValueError – If an invalid strategy is provided.

lightly.models.utils.mask_at_index(tokens: Tensor, index: Tensor, mask_token: Tensor) → Tensor

Returns a tensor where the tokens at the given indices are replaced by the mask token.

Parameters

tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
mask_token – Value tensor with shape (1, 1, dim).

Returns

Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.

lightly.models.utils.mask_bool(tokens: Tensor, mask: Tensor, mask_token: Tensor) → Tensor

Returns a tensor with tokens replaced by the mask tokens in all positions where the mask is True.

Parameters

tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
mask – Boolean mask tensor with shape (batch_size, sequence_length).
mask_token – Mask token with shape (1, 1, dim).

Returns

Tokens tensor with shape (batch_size, sequence_length, dim) where tokens[i, j] is replaced by the mask token if mask[i, j] is True.

lightly.models.utils.most_similar_index(x: Tensor, y: Tensor) → Tensor

For each feature in x, searches the most similar feature in y and returns the corresponding index.

Parameters

x – Tensor with shape (B, N, C) containing the features to compare.
y – Tensor with shape (B, N, C) containing the features to search for similarity.

Returns

Index with shape (B, N) such that y[i, index[i, j]] is most similar to x[i, j] over all y[i, …].

lightly.models.utils.nearest_neighbors(input_maps: Tensor, candidate_maps: Tensor, distances: Tensor, num_matches: int) → Tuple[Tensor, Tensor]

Finds the nearest neighbors of the maps in input_maps in candidate_maps.

Parameters

input_maps – A tensor of maps for which to find nearest neighbors. It has shape: [batch_size, input_map_size, feature_dimension]
candidate_maps – A tensor of maps to search for nearest neighbors. It has shape: [batch_size, candidate_map_size, feature_dimension]
distances – A tensor of distances between the maps in input_maps and candidate_maps. It has shape: [batch_size, input_map_size, candidate_map_size]
num_matches – Number of nearest neighbors to return. If num_matches is None or -1, all the maps in candidate_maps are considered.

Returns

A tuple of tensors, containing the nearest neighbors in input_maps and candidate_maps. They both have shape: [batch_size, input_map_size, feature_dimension]

lightly.models.utils.normalize_mean_var(x: Tensor, dim: int = -1, eps: float = 1e-06) → Tensor

Normalizes the input tensor to zero mean and unit variance.

Parameters

x – Input tensor.
dim – Dimension along which to compute mean and standard deviation. Takes last dimension by default.
eps – Epsilon value to avoid division by zero.

Returns

Normalized tensor.

lightly.models.utils.normalize_weight(weight: Parameter, dim: int = 1, keepdim: bool = True): Normalizes the weight to unit length along the specified dimension.

lightly.models.utils.patchify(images: Tensor, patch_size: int) → Tensor

Converts a batch of input images into patches.

Parameters

images – Images tensor with shape (batch_size, channels, height, width)
patch_size – Patch size in pixels. Image width and height must be multiples of the patch size.

Returns

Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2) where num_patches = image_width / patch_size * image_height / patch_size.

lightly.models.utils.pool_masked(source: Tensor, mask: Tensor, num_cls: int, reduce: str = 'mean') → Tensor

Reduce image feature maps \((B, C, H, W)\) or \((C, H, W)\) according to an integer index given by mask \((B, H, W)\) or \((H, W)\).

Parameters

source – Float tensor of shape \((B, C, H, W)\) or \((C, H, W)\) to be reduced.
mask – Integer tensor of shape \((B, H, W)\) or \((H, W)\) containing the integer indices.
num_cls – The number of classes in the possible masks.

Returns

A tensor of shape \((B, C, num_cls)\) or \((C, num_cls)\).

lightly.models.utils.prepend_class_token(tokens: Tensor, class_token: Tensor) → Tensor

Prepends class token to tokens.

Parameters

tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
class_token – Class token with shape (1, 1, dim).

Returns

Tokens tensor with the class token prepended at index 0 in every sequence. The tensor has shape (batch_size, sequence_length + 1, dim).

lightly.models.utils.random_block_mask(size: Tuple[int, int, int], batch_mask_ratio: float = 0.5, min_image_mask_ratio: float = 0.1, max_image_mask_ratio: float = 0.5, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) → Tensor

Creates a random block mask for a batch of images.

A block is in this context a rectangle of patches in an image that are masked together. The function generates block masks until the desired number of patches per image are masked. DINOv2 uses a more complex masking strategy that only generates masks for mask_ratio of the images. On top of that, it also masks a different number of patches for every image. This is controlled by the min_image_mask_ratio and max_image_mask_ratio arguments.

Based on the implementation of the block mask in DINOv2 [0]. For details see [1] and [2].

[0]: DINOv2, 2023, https://arxiv.org/abs/2304.07193
[1]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/masking.py
[2]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/collate.py

Parameters

size – Size of the image batch for which to generate masks. Should be (batch_size, height, width).
batch_mask_ratio – Percentage of images per batch for which to generate block masks. The remaining images are not masked.
min_image_mask_ratio – Minimum percentage of the image to mask. In practice, fewer than min_image_mask_ratio patches of the image can be masked due to additional constraints.
max_image_mask_ratio – Maximum percentage of the image to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask for an image.
device – Device on which to create the mask.

Returns

A boolean tensor with shape (batch_size, height, width) where each entry is True if the patch should be masked and False otherwise.

Raises

ValueError – If ‘max_image_mask_ratio’ is less than ‘min_image_mask_ratio’.

lightly.models.utils.random_block_mask_image(size: Tuple[int, int], num_masks: int, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) → Tensor

Creates a random block mask for a single image.

Parameters

size – Size of the image for which to generate a mask. Should be (height, width).
num_masks – Number of patches to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask.
device – Device on which to create the mask.

Returns

A boolean tensor with shape (height, width) where each entry is True if the patch should be masked and False otherwise.

Raises

ValueError – If ‘max_num_masks_per_block’ is less than ‘min_num_masks_per_block’ or if ‘max_block_aspect_ratio’ is less than ‘min_block_aspect_ratio’

lightly.models.utils.random_prefix_mask(size: Tuple[int, int], max_prefix_length: int, device: Optional[Union[device, str]] = None) → Tensor

Creates a random prefix mask.

The mask is created by uniformly sampling a prefix length in [0, max_prefix_length] for each sequence in the batch. All tokens with an index greater or equal to the prefix length are masked.

Parameters

size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
max_prefix_length – Maximum length of the prefix to mask.
device – Device on which to create the mask.

Returns

A mask tensor with shape (batch_size, sequence_length) where each entry is True if the token should be masked and False otherwise.

lightly.models.utils.random_token_mask(size: Tuple[int, int], mask_ratio: float = 0.6, mask_class_token: bool = False, device: Optional[Union[device, str]] = None) → Tuple[Tensor, Tensor]

Creates random token masks.

Parameters

size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
mask_ratio – Proportion of tokens to mask.
mask_class_token – If False the class token is never masked. If True the class token might be masked.
device – Device on which to create the index masks.

Returns

A (index_keep, index_mask) tuple where each index is a tensor. index_keep contains the indices of the unmasked tokens and has shape (batch_size, num_keep). index_mask contains the indices of the masked tokens and has shape (batch_size, sequence_length - num_keep). num_keep is equal to sequence_length * (1 - mask_ratio).

lightly.models.utils.repeat_interleave_batch(x: Tensor, B: int, repeat: int) → Tensor

Repeat and interleave the input tensor.

Parameters

x – Tensor with shape (B * N, …) where B is the batch size and N the number of batches.
B – Batch size.
repeat – Number of times to repeat each batch.

Returns

Tensor with shape (B * repeat * N, …) where each batch is repeated repeat times.

lightly.models.utils.repeat_token(token: Tensor, size: Tuple[int, int]) → Tensor

Repeats a token size times.

Parameters

token – Token tensor with shape (1, 1, dim).
size – (batch_size, sequence_length) tuple.

Returns

Tensor with shape (batch_size, sequence_length, dim) containing copies of the input token.

lightly.models.utils.select_most_similar(x: Tensor, y: Tensor, y_values: Tensor) → Tensor

For each feature in x, searches the most similar feature in y and returns the corresponding value from y_values.

Parameters

x – Tensor with shape (B, N, C).
y – Tensor with shape (B, N, C).
y_values – Tensor with shape (B, N, D).

Returns

Values with shape (B, N, D) where values[i, j] is the entry in y_values[i, …] such that x[i, j] is the most similar to y[i, …].

lightly.models.utils.set_at_index(tokens: Tensor, index: Tensor, value: Tensor) → Tensor

Copies all values into the input tensor at the given indices.

Parameters

tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
value – Value tensor with shape (batch_size, index_length, dim).

Returns

Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.

lightly.models.utils.unpatchify(patches: Tensor, patch_size: int, channels: int = 3) → Tensor

Reconstructs images from their patches.

Args:

patches:
Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2).

patch_size:
The patch size in pixels used to create the patches.

channels:
The number of channels the image must have

Returns:
Reconstructed images tensor with shape (batch_size, channels, height, width).

lightly.models.utils.update_drop_path_rate(model: VisionTransformer, drop_path_rate: float, mode: str = 'linear') → None

Updates the drop path rate in a TIMM VisionTransformer model.

Parameters

model – TIMM VisionTransformer model.
drop_path_rate – Maximum drop path rate.
mode – Drop path rate update mode. Can be “linear” or “uniform”. Linear increases the drop path rate from 0 to drop_path_rate over the depth of the model. Uniform sets the drop path rate to drop_path_rate for all blocks.

Raises

ValueError – If an unknown mode is provided.

lightly.models.utils.update_momentum(model: Module, model_ema: Module, m: float)

Updates parameters of model_ema with Exponential Moving Average of model

Momentum encoders are a crucial component for models such as MoCo or BYOL.

Parameters

model – The current model.
model_ema – The model with exponential moving average (EMA) parameters.
m – The momentum factor, between 0 and 1.

Examples

>>> backbone = resnet18()
>>> projection_head = MoCoProjectionHead()
>>> backbone_momentum = copy.deepcopy(moco)
>>> projection_head_momentum = copy.deepcopy(projection_head)
>>>
>>> # update momentum
>>> update_momentum(moco, moco_momentum, m=0.999)
>>> update_momentum(projection_head, projection_head_momentum, m=0.999)