lightly.models.utils
Utils for working with SSL models
- lightly.models.utils.activate_requires_grad(model: Module)
Activates the requires_grad flag for all parameters of a model.
Use this method to activate gradients for a model (e.g. after deactivating them using deactivate_requires_grad(…)).
Examples
>>> backbone = resnet18() >>> activate_requires_grad(backbone)
- lightly.models.utils.add_stochastic_depth_to_blocks(vit: Module, prob: float = 0.0, mode='row') None
Adds stochastic depth dropout to all transformer blocks in a Vision Transformer Model
- Parameters
vit – Vision Transformer Model to which stochastic depth dropout will be added.
prob – Probability of dropping a layer.
mode – Mode for stochastic depth. Default is “row”.
- Raises
Runtime Error – If torchvision version is less than 0.12.
- lightly.models.utils.apply_masks(x: Tensor, masks: torch.Tensor | list[torch.Tensor]) Tensor
Apply masks to the input tensor.
From https://github.com/facebookresearch/ijepa/blob/main/src/masks/utils.py
- Parameters
x – Tensor of shape (B, N, D) where N is the number of patches.
masks – Tensor or list of tensors containing indices of patches in [0, N-1] to keep. Each tensor musth have shape (B, K) where K is the number of patches to keep. All masks must have the same K.
- Returns
Tensor of shape (B * num_masks, K, D) where K is the number of patches to keep.
- lightly.models.utils.batch_shuffle(batch: Tensor, distributed: bool = False) Tuple[Tensor, Tensor]
Randomly shuffles all tensors in the batch.
- Parameters
batch – The batch to shuffle.
distributed – If True then batches are shuffled across multiple gpus.
- Returns
A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.
Examples
>>> # forward pass through the momentum model with batch shuffling >>> x1_shuffled, shuffle = batch_shuffle(x1) >>> f1 = moco_momentum(x1) >>> out0 = projection_head_momentum(f0) >>> out1 = batch_unshuffle(out1, shuffle)
- lightly.models.utils.batch_shuffle_distributed(batch: Tensor) Tuple[Tensor, Tensor]
Shuffles batch over multiple devices.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- Parameters
batch – The tensor to shuffle.
- Returns
A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.
- lightly.models.utils.batch_unshuffle(batch: Tensor, shuffle: Tensor, distributed: bool = False) Tensor
Unshuffles a batch.
- Parameters
batch – The batch to unshuffle.
shuffle – Index to unshuffle the batch.
distributed – If True then the batch is unshuffled across multiple gpus.
- Returns
The unshuffled batch.
Examples
>>> # forward pass through the momentum model with batch shuffling >>> x1_shuffled, shuffle = batch_shuffle(x1) >>> f1 = moco_momentum(x1) >>> out0 = projection_head_momentum(f0) >>> out1 = batch_unshuffle(out1, shuffle)
- lightly.models.utils.batch_unshuffle_distributed(batch: Tensor, shuffle: Tensor) Tensor
Undo batch shuffle over multiple devices.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- Parameters
batch – The tensor to unshuffle.
shuffle – Index to restore the original tensor.
- Returns
The unshuffled tensor.
- lightly.models.utils.concat_all_gather(x: Tensor) Tensor
Returns concatenated instances of x gathered from all gpus.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- lightly.models.utils.deactivate_requires_grad(model: Module)
Deactivates the requires_grad flag for all parameters of a model.
This has the same effect as permanently executing the model within a torch.no_grad() context. Use this method to disable gradient computation and therefore training for a model.
Examples
>>> backbone = resnet18() >>> deactivate_requires_grad(backbone)
- lightly.models.utils.expand_index_like(index: Tensor, tokens: Tensor) Tensor
Expands the index along the last dimension of the input tokens.
- Parameters
index – Index tensor with shape (batch_size, idx_length) where each entry is an index in [0, sequence_length).
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
- Returns
Index tensor with shape (batch_size, idx_length, dim) where the original indices are repeated dim times along the last dimension.
- lightly.models.utils.get_1d_sine_cosine_positional_embedding_from_positions(embed_dim: int, pos: NDArray[np.float32]) NDArray[np.float32]
Generates 1D sine-cosine positional embedding from positions.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
pos – Positions to be encoded with shape (N, M).
- Returns
Positional embedding with shape (N * M, embed_dim).
- lightly.models.utils.get_2d_sincos_pos_embed(embed_dim: int, grid_size: int, cls_token: bool) NDArray[np.float32]
Generates 2D sine-cosine positional embedding.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.
- lightly.models.utils.get_2d_sine_cosine_positional_embedding(embed_dim: int, grid_size: int, cls_token: bool) NDArray[np.float32]
Generates 2D sine-cosine positional embedding.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.
- lightly.models.utils.get_2d_sine_cosine_positional_embedding_from_grid(embed_dim: int, grid: NDArray[np.float32]) NDArray[np.float32]
Generates 2D sine-cosine positional embedding from a grid.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid – Grid of shape (2, grid_size, grid_size) with x and y coordinates.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim).
- lightly.models.utils.get_at_index(tokens: Tensor, index: Tensor) Tensor
Selects tokens at index.
- Parameters
tokens – Token tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length) where each entry is an index in [0, sequence_length).
- Returns
Token tensor with shape (batch_size, index_length, dim) containing the selected tokens.
- lightly.models.utils.get_named_leaf_modules(module: Module) Dict[str, Module]
Returns all leaf modules of the model with their names.
- lightly.models.utils.get_weight_decay_parameters(modules: ~typing.Iterable[~torch.nn.modules.module.Module], decay_norm: bool = False, decay_bias: bool = False, norm_layers: ~typing.Tuple[~typing.Type[~torch.nn.modules.module.Module], ...] = (<class 'torch.nn.modules.batchnorm._NormBase'>, <class 'torch.nn.modules.normalization.LayerNorm'>, <class 'torch.nn.modules.normalization.CrossMapLRN2d'>, <class 'torch.nn.modules.normalization.LocalResponseNorm'>, <class 'torch.nn.modules.normalization.GroupNorm'>)) Tuple[List[Parameter], List[Parameter]]
Returns all parameters of the modules that should be decayed and not decayed.
- Parameters
modules – List of modules to get the parameters from.
decay_norm – If True, normalization parameters are decayed.
decay_bias – If True, bias parameters are decayed.
norm_layers – Tuple of normalization classes to decay if decay_norm is True.
- Returns
(params, params_no_weight_decay) tuple.
- lightly.models.utils.initialize_learnable_positional_embedding(pos_embedding: Parameter) None
Initializes a learnable positional embedding.
Uses standard initialization for ViT models, see [0].
- Parameters
pos_embedding – Positional embedding parameter.
- lightly.models.utils.initialize_positional_embedding(pos_embedding: Parameter, strategy: str, num_prefix_tokens: int) None
Initializes the positional embedding with the given strategy.
- Parameters
pos_embedding – Positional embedding parameter.
strategy – Positional embedding initialization strategy. Valid options are: [‘learn’, ‘sincos’, ‘skip’]. ‘learn’ makes the embedding learnable, ‘sincos’ creates a fixed 2D sine-cosine positional embedding, and ‘skip’ does not initialize the positional embedding.
num_prefix_tokens – Number of prefix tokens in the positional embedding. This includes the class token.
- Raises
ValueError – If an invalid strategy is provided.
- lightly.models.utils.mask_at_index(tokens: Tensor, index: Tensor, mask_token: Tensor) Tensor
Returns a tensor where the tokens at the given indices are replaced by the mask token.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
mask_token – Value tensor with shape (1, 1, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.
- lightly.models.utils.mask_bool(tokens: Tensor, mask: Tensor, mask_token: Tensor) Tensor
Returns a tensor with tokens replaced by the mask tokens in all positions where the mask is True.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
mask – Boolean mask tensor with shape (batch_size, sequence_length).
mask_token – Mask token with shape (1, 1, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) where tokens[i, j] is replaced by the mask token if mask[i, j] is True.
- lightly.models.utils.most_similar_index(x: Tensor, y: Tensor) Tensor
For each feature in x, searches the most similar feature in y and returns the corresponding index.
- Parameters
x – Tensor with shape (B, N, C) containing the features to compare.
y – Tensor with shape (B, N, C) containing the features to search for similarity.
- Returns
Index with shape (B, N) such that y[i, index[i, j]] is most similar to x[i, j] over all y[i, …].
- lightly.models.utils.nearest_neighbors(input_maps: Tensor, candidate_maps: Tensor, distances: Tensor, num_matches: int) Tuple[Tensor, Tensor]
Finds the nearest neighbors of the maps in input_maps in candidate_maps.
- Parameters
input_maps – A tensor of maps for which to find nearest neighbors. It has shape: [batch_size, input_map_size, feature_dimension]
candidate_maps – A tensor of maps to search for nearest neighbors. It has shape: [batch_size, candidate_map_size, feature_dimension]
distances – A tensor of distances between the maps in input_maps and candidate_maps. It has shape: [batch_size, input_map_size, candidate_map_size]
num_matches – Number of nearest neighbors to return. If num_matches is None or -1, all the maps in candidate_maps are considered.
- Returns
A tuple of tensors, containing the nearest neighbors in input_maps and candidate_maps. They both have shape: [batch_size, input_map_size, feature_dimension]
- lightly.models.utils.normalize_mean_var(x: Tensor, dim: int = -1, eps: float = 1e-06) Tensor
Normalizes the input tensor to zero mean and unit variance.
- Parameters
x – Input tensor.
dim – Dimension along which to compute mean and standard deviation. Takes last dimension by default.
eps – Epsilon value to avoid division by zero.
- Returns
Normalized tensor.
- lightly.models.utils.normalize_weight(weight: Parameter, dim: int = 1, keepdim: bool = True)
Normalizes the weight to unit length along the specified dimension.
- lightly.models.utils.patchify(images: Tensor, patch_size: int) Tensor
Converts a batch of input images into patches.
- Parameters
images – Images tensor with shape (batch_size, channels, height, width)
patch_size – Patch size in pixels. Image width and height must be multiples of the patch size.
- Returns
Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2) where num_patches = image_width / patch_size * image_height / patch_size.
- lightly.models.utils.pool_masked(source: Tensor, mask: Tensor, num_cls: int, reduce: str = 'mean') Tensor
Reduce image feature maps \((B, C, H, W)\) or \((C, H, W)\) according to an integer index given by mask \((B, H, W)\) or \((H, W)\).
- Parameters
source – Float tensor of shape \((B, C, H, W)\) or \((C, H, W)\) to be reduced.
mask – Integer tensor of shape \((B, H, W)\) or \((H, W)\) containing the integer indices.
num_cls – The number of classes in the possible masks.
- Returns
A tensor of shape \((B, C, num_cls)\) or \((C, num_cls)\).
- lightly.models.utils.prepend_class_token(tokens: Tensor, class_token: Tensor) Tensor
Prepends class token to tokens.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
class_token – Class token with shape (1, 1, dim).
- Returns
Tokens tensor with the class token prepended at index 0 in every sequence. The tensor has shape (batch_size, sequence_length + 1, dim).
- lightly.models.utils.random_block_mask(size: Tuple[int, int, int], batch_mask_ratio: float = 0.5, min_image_mask_ratio: float = 0.1, max_image_mask_ratio: float = 0.5, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) Tensor
Creates a random block mask for a batch of images.
A block is in this context a rectangle of patches in an image that are masked together. The function generates block masks until the desired number of patches per image are masked. DINOv2 uses a more complex masking strategy that only generates masks for mask_ratio of the images. On top of that, it also masks a different number of patches for every image. This is controlled by the min_image_mask_ratio and max_image_mask_ratio arguments.
Based on the implementation of the block mask in DINOv2 [0]. For details see [1] and [2].
[0]: DINOv2, 2023, https://arxiv.org/abs/2304.07193
[1]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/masking.py
[2]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/collate.py
- Parameters
size – Size of the image batch for which to generate masks. Should be (batch_size, height, width).
batch_mask_ratio – Percentage of images per batch for which to generate block masks. The remaining images are not masked.
min_image_mask_ratio – Minimum percentage of the image to mask. In practice, fewer than min_image_mask_ratio patches of the image can be masked due to additional constraints.
max_image_mask_ratio – Maximum percentage of the image to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask for an image.
device – Device on which to create the mask.
- Returns
A boolean tensor with shape (batch_size, height, width) where each entry is True if the patch should be masked and False otherwise.
- Raises
ValueError – If ‘max_image_mask_ratio’ is less than ‘min_image_mask_ratio’.
- lightly.models.utils.random_block_mask_image(size: Tuple[int, int], num_masks: int, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) Tensor
Creates a random block mask for a single image.
- Parameters
size – Size of the image for which to generate a mask. Should be (height, width).
num_masks – Number of patches to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask.
device – Device on which to create the mask.
- Returns
A boolean tensor with shape (height, width) where each entry is True if the patch should be masked and False otherwise.
- Raises
ValueError – If ‘max_num_masks_per_block’ is less than ‘min_num_masks_per_block’ or if ‘max_block_aspect_ratio’ is less than ‘min_block_aspect_ratio’
- lightly.models.utils.random_prefix_mask(size: Tuple[int, int], max_prefix_length: int, device: Optional[Union[device, str]] = None) Tensor
Creates a random prefix mask.
The mask is created by uniformly sampling a prefix length in [0, max_prefix_length] for each sequence in the batch. All tokens with an index greater or equal to the prefix length are masked.
- Parameters
size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
max_prefix_length – Maximum length of the prefix to mask.
device – Device on which to create the mask.
- Returns
A mask tensor with shape (batch_size, sequence_length) where each entry is True if the token should be masked and False otherwise.
- lightly.models.utils.random_token_mask(size: Tuple[int, int], mask_ratio: float = 0.6, mask_class_token: bool = False, device: Optional[Union[device, str]] = None) Tuple[Tensor, Tensor]
Creates random token masks.
- Parameters
size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
mask_ratio – Proportion of tokens to mask.
mask_class_token – If False the class token is never masked. If True the class token might be masked.
device – Device on which to create the index masks.
- Returns
A (index_keep, index_mask) tuple where each index is a tensor. index_keep contains the indices of the unmasked tokens and has shape (batch_size, num_keep). index_mask contains the indices of the masked tokens and has shape (batch_size, sequence_length - num_keep). num_keep is equal to sequence_length * (1 - mask_ratio).
- lightly.models.utils.repeat_interleave_batch(x: Tensor, B: int, repeat: int) Tensor
Repeat and interleave the input tensor.
- Parameters
x – Tensor with shape (B * N, …) where B is the batch size and N the number of batches.
B – Batch size.
repeat – Number of times to repeat each batch.
- Returns
Tensor with shape (B * repeat * N, …) where each batch is repeated repeat times.
- lightly.models.utils.repeat_token(token: Tensor, size: Tuple[int, int]) Tensor
Repeats a token size times.
- Parameters
token – Token tensor with shape (1, 1, dim).
size – (batch_size, sequence_length) tuple.
- Returns
Tensor with shape (batch_size, sequence_length, dim) containing copies of the input token.
- lightly.models.utils.select_most_similar(x: Tensor, y: Tensor, y_values: Tensor) Tensor
For each feature in x, searches the most similar feature in y and returns the corresponding value from y_values.
- Parameters
x – Tensor with shape (B, N, C).
y – Tensor with shape (B, N, C).
y_values – Tensor with shape (B, N, D).
- Returns
Values with shape (B, N, D) where values[i, j] is the entry in y_values[i, …] such that x[i, j] is the most similar to y[i, …].
- lightly.models.utils.set_at_index(tokens: Tensor, index: Tensor, value: Tensor) Tensor
Copies all values into the input tensor at the given indices.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
value – Value tensor with shape (batch_size, index_length, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.
- lightly.models.utils.unpatchify(patches: Tensor, patch_size: int, channels: int = 3) Tensor
Reconstructs images from their patches.
- Args:
- patches:
Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2).
- patch_size:
The patch size in pixels used to create the patches.
- channels:
The number of channels the image must have
- Returns:
Reconstructed images tensor with shape (batch_size, channels, height, width).
- lightly.models.utils.update_drop_path_rate(model: VisionTransformer, drop_path_rate: float, mode: str = 'linear') None
Updates the drop path rate in a TIMM VisionTransformer model.
- Parameters
model – TIMM VisionTransformer model.
drop_path_rate – Maximum drop path rate.
mode – Drop path rate update mode. Can be “linear” or “uniform”. Linear increases the drop path rate from 0 to drop_path_rate over the depth of the model. Uniform sets the drop path rate to drop_path_rate for all blocks.
- Raises
ValueError – If an unknown mode is provided.
- lightly.models.utils.update_momentum(model: Module, model_ema: Module, m: float)
Updates parameters of model_ema with Exponential Moving Average of model
Momentum encoders are a crucial component for models such as MoCo or BYOL.
- Parameters
model – The current model.
model_ema – The model with exponential moving average (EMA) parameters.
m – The momentum factor, between 0 and 1.
Examples
>>> backbone = resnet18() >>> projection_head = MoCoProjectionHead() >>> backbone_momentum = copy.deepcopy(moco) >>> projection_head_momentum = copy.deepcopy(projection_head) >>> >>> # update momentum >>> update_momentum(moco, moco_momentum, m=0.999) >>> update_momentum(projection_head, projection_head_momentum, m=0.999)