lightly.models.utils
Utils for working with SSL models
- lightly.models.utils.activate_requires_grad(model: Module)
Activates the requires_grad flag for all parameters of a model.
Use this method to activate gradients for a model (e.g. after deactivating them using deactivate_requires_grad(…)).
Examples
>>> backbone = resnet18() >>> activate_requires_grad(backbone)
- lightly.models.utils.add_stochastic_depth_to_blocks(vit: Module, prob: float = 0.0, mode='row') None
Adds stochastic depth dropout to all transformer blocks in a Vision Transformer Model
- Parameters
vit – Vision Transformer Model to which stochastic depth dropout will be added.
prob – Probability of dropping a layer.
mode – Mode for stochastic depth. Default is “row”.
- Raises
Runtime Error – If torchvision version is less than 0.12.
- lightly.models.utils.apply_masks(x: Tensor, masks: torch.Tensor | list[torch.Tensor]) Tensor
Apply masks to the input tensor.
From https://github.com/facebookresearch/ijepa/blob/main/src/masks/utils.py
- Parameters
x – Tensor of shape (B, N, D) where N is the number of patches.
masks – Tensor or list of tensors containing indices of patches in [0, N-1] to keep. Each tensor musth have shape (B, K) where K is the number of patches to keep. All masks must have the same K.
- Returns
Tensor of shape (B * num_masks, K, D) where K is the number of patches to keep.
- lightly.models.utils.batch_shuffle(batch: Tensor, distributed: bool = False) Tuple[Tensor, Tensor]
Randomly shuffles all tensors in the batch.
- Parameters
batch – The batch to shuffle.
distributed – If True then batches are shuffled across multiple gpus.
- Returns
A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.
Examples
>>> # forward pass through the momentum model with batch shuffling >>> x1_shuffled, shuffle = batch_shuffle(x1) >>> f1 = moco_momentum(x1) >>> out0 = projection_head_momentum(f0) >>> out1 = batch_unshuffle(out1, shuffle)
- lightly.models.utils.batch_shuffle_distributed(batch: Tensor) Tuple[Tensor, Tensor]
Shuffles batch over multiple devices.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- Parameters
batch – The tensor to shuffle.
- Returns
A (batch, shuffle) tuple where batch is the shuffled version of the input batch and shuffle is an index to restore the original order.
- lightly.models.utils.batch_unshuffle(batch: Tensor, shuffle: Tensor, distributed: bool = False) Tensor
Unshuffles a batch.
- Parameters
batch – The batch to unshuffle.
shuffle – Index to unshuffle the batch.
distributed – If True then the batch is unshuffled across multiple gpus.
- Returns
The unshuffled batch.
Examples
>>> # forward pass through the momentum model with batch shuffling >>> x1_shuffled, shuffle = batch_shuffle(x1) >>> f1 = moco_momentum(x1) >>> out0 = projection_head_momentum(f0) >>> out1 = batch_unshuffle(out1, shuffle)
- lightly.models.utils.batch_unshuffle_distributed(batch: Tensor, shuffle: Tensor) Tensor
Undo batch shuffle over multiple devices.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- Parameters
batch – The tensor to unshuffle.
shuffle – Index to restore the original tensor.
- Returns
The unshuffled tensor.
- lightly.models.utils.concat_all_gather(x: Tensor) Tensor
Returns concatenated instances of x gathered from all gpus.
This code was taken and adapted from here: https://github.com/facebookresearch/moco.
- lightly.models.utils.deactivate_requires_grad(model: Module)
Deactivates the requires_grad flag for all parameters of a model.
This has the same effect as permanently executing the model within a torch.no_grad() context. Use this method to disable gradient computation and therefore training for a model.
Examples
>>> backbone = resnet18() >>> deactivate_requires_grad(backbone)
- lightly.models.utils.expand_index_like(index: Tensor, tokens: Tensor) Tensor
Expands the index along the last dimension of the input tokens.
- Parameters
index – Index tensor with shape (batch_size, idx_length) where each entry is an index in [0, sequence_length).
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
- Returns
Index tensor with shape (batch_size, idx_length, dim) where the original indices are repeated dim times along the last dimension.
- lightly.models.utils.get_1d_sine_cosine_positional_embedding_from_positions(embed_dim: int, pos: NDArray[np.float32]) NDArray[np.float32]
Generates 1D sine-cosine positional embedding from positions.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
pos – Positions to be encoded with shape (N, M).
- Returns
Positional embedding with shape (N * M, embed_dim).
- lightly.models.utils.get_2d_sincos_pos_embed(embed_dim: int, grid_size: int, cls_token: bool) NDArray[np.float32]
Generates 2D sine-cosine positional embedding.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.
- lightly.models.utils.get_2d_sine_cosine_positional_embedding(embed_dim: int, grid_size: int, cls_token: bool) NDArray[np.float32]
Generates 2D sine-cosine positional embedding.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid_size – Height and width of the grid.
cls_token – If True, a positional embedding for the class token is generated.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim) or (1 + grid_size * grid_size, embed_dim) if cls_token is True.
- lightly.models.utils.get_2d_sine_cosine_positional_embedding_from_grid(embed_dim: int, grid: NDArray[np.float32]) NDArray[np.float32]
Generates 2D sine-cosine positional embedding from a grid.
Code follows: https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
- Parameters
embed_dim – Embedding dimension.
grid – Grid of shape (2, grid_size, grid_size) with x and y coordinates.
- Returns
Positional embedding with shape (grid_size * grid_size, embed_dim).
- lightly.models.utils.get_at_index(tokens: Tensor, index: Tensor) Tensor
Selects tokens at index.
- Parameters
tokens – Token tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length) where each entry is an index in [0, sequence_length).
- Returns
Token tensor with shape (batch_size, index_length, dim) containing the selected tokens.
- lightly.models.utils.get_named_leaf_modules(module: Module) Dict[str, Module]
Returns all leaf modules of the model with their names.
- lightly.models.utils.get_weight_decay_parameters(modules: ~typing.Iterable[~torch.nn.modules.module.Module], decay_norm: bool = False, decay_bias: bool = False, norm_layers: ~typing.Tuple[~typing.Type[~torch.nn.modules.module.Module], ...] = (<class 'torch.nn.modules.batchnorm._NormBase'>, <class 'torch.nn.modules.normalization.LayerNorm'>, <class 'torch.nn.modules.normalization.CrossMapLRN2d'>, <class 'torch.nn.modules.normalization.LocalResponseNorm'>, <class 'torch.nn.modules.normalization.GroupNorm'>)) Tuple[List[Parameter], List[Parameter]]
Returns all parameters of the modules that should be decayed and not decayed.
- Parameters
modules – List of modules to get the parameters from.
decay_norm – If True, normalization parameters are decayed.
decay_bias – If True, bias parameters are decayed.
norm_layers – Tuple of normalization classes to decay if decay_norm is True.
- Returns
(params, params_no_weight_decay) tuple.
- lightly.models.utils.initialize_learnable_positional_embedding(pos_embedding: Parameter) None
Initializes a learnable positional embedding.
Uses standard initialization for ViT models, see [0].
- Parameters
pos_embedding – Positional embedding parameter.
- lightly.models.utils.initialize_positional_embedding(pos_embedding: Parameter, strategy: str, num_prefix_tokens: int) None
Initializes the positional embedding with the given strategy.
- Parameters
pos_embedding – Positional embedding parameter.
strategy – Positional embedding initialization strategy. Valid options are: [‘learn’, ‘sincos’, ‘skip’]. ‘learn’ makes the embedding learnable, ‘sincos’ creates a fixed 2D sine-cosine positional embedding, and ‘skip’ does not initialize the positional embedding.
num_prefix_tokens – Number of prefix tokens in the positional embedding. This includes the class token.
- Raises
ValueError – If an invalid strategy is provided.
- lightly.models.utils.mask_at_index(tokens: Tensor, index: Tensor, mask_token: Tensor) Tensor
Returns a tensor where the tokens at the given indices are replaced by the mask token.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
mask_token – Value tensor with shape (1, 1, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.
- lightly.models.utils.mask_bool(tokens: Tensor, mask: Tensor, mask_token: Tensor) Tensor
Returns a tensor with tokens replaced by the mask tokens in all positions where the mask is True.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
mask – Boolean mask tensor with shape (batch_size, sequence_length).
mask_token – Mask token with shape (1, 1, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) where tokens[i, j] is replaced by the mask token if mask[i, j] is True.
- lightly.models.utils.most_similar_index(x: Tensor, y: Tensor) Tensor
For each feature in x, searches the most similar feature in y and returns the corresponding index.
- Parameters
x – Tensor with shape (B, N, C) containing the features to compare.
y – Tensor with shape (B, N, C) containing the features to search for similarity.
- Returns
Index with shape (B, N) such that y[i, index[i, j]] is most similar to x[i, j] over all y[i, …].
- lightly.models.utils.nearest_neighbors(input_maps: Tensor, candidate_maps: Tensor, distances: Tensor, num_matches: int) Tuple[Tensor, Tensor]
Finds the nearest neighbors of the maps in input_maps in candidate_maps.
- Parameters
input_maps – A tensor of maps for which to find nearest neighbors. It has shape: [batch_size, input_map_size, feature_dimension]
candidate_maps – A tensor of maps to search for nearest neighbors. It has shape: [batch_size, candidate_map_size, feature_dimension]
distances – A tensor of distances between the maps in input_maps and candidate_maps. It has shape: [batch_size, input_map_size, candidate_map_size]
num_matches – Number of nearest neighbors to return. If num_matches is None or -1, all the maps in candidate_maps are considered.
- Returns
A tuple of tensors, containing the nearest neighbors in input_maps and candidate_maps. They both have shape: [batch_size, input_map_size, feature_dimension]
- lightly.models.utils.normalize_mean_var(x: Tensor, dim: int = -1, eps: float = 1e-06) Tensor
Normalizes the input tensor to zero mean and unit variance.
- Parameters
x – Input tensor.
dim – Dimension along which to compute mean and standard deviation. Takes last dimension by default.
eps – Epsilon value to avoid division by zero.
- Returns
Normalized tensor.
- lightly.models.utils.normalize_weight(weight: Parameter, dim: int = 1, keepdim: bool = True)
Normalizes the weight to unit length along the specified dimension.
- lightly.models.utils.patchify(images: Tensor, patch_size: int) Tensor
Converts a batch of input images into patches.
- Parameters
images – Images tensor with shape (batch_size, channels, height, width)
patch_size – Patch size in pixels. Image width and height must be multiples of the patch size.
- Returns
Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2) where num_patches = image_width / patch_size * image_height / patch_size.
- lightly.models.utils.pool_masked(source: Tensor, mask: Tensor, reduce: str = 'mean', num_cls: Optional[int] = None) Tensor
Reduce image feature maps (B, C, H, W) or (C, H, W) according to an integer index given by mask (B, H, W) or (H, W).
- Parameters
source – Float tensor of shape (B, C, H, W) or (C, H, W) to be reduced.
mask – Integer tensor of shape (B, H, W) or (H, W) containing the integer indices.
reduce – The reduction operation to be applied, one of ‘prod’, ‘mean’, ‘amax’ or ‘amin’. Defaults to ‘mean’.
num_cls – The number of classes in the possible masks. If None, the number of classes is inferred from the unique elements in mask. This is useful when not all classes are present in the mask.
- Returns
A tensor of shape (B, C, N) or (C, N) where N is the number of unique elements in mask or num_cls if specified.
- lightly.models.utils.prepend_class_token(tokens: Tensor, class_token: Tensor) Tensor
Prepends class token to tokens.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
class_token – Class token with shape (1, 1, dim).
- Returns
Tokens tensor with the class token prepended at index 0 in every sequence. The tensor has shape (batch_size, sequence_length + 1, dim).
- lightly.models.utils.random_block_mask(size: Tuple[int, int, int], batch_mask_ratio: float = 0.5, min_image_mask_ratio: float = 0.1, max_image_mask_ratio: float = 0.5, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) Tensor
Creates a random block mask for a batch of images.
A block is in this context a rectangle of patches in an image that are masked together. The function generates block masks until the desired number of patches per image are masked. DINOv2 uses a more complex masking strategy that only generates masks for mask_ratio of the images. On top of that, it also masks a different number of patches for every image. This is controlled by the min_image_mask_ratio and max_image_mask_ratio arguments.
Based on the implementation of the block mask in DINOv2 [0]. For details see [1] and [2].
[0]: DINOv2, 2023, https://arxiv.org/abs/2304.07193
[1]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/masking.py
[2]: https://github.com/facebookresearch/dinov2/blob/main/dinov2/data/collate.py
- Parameters
size – Size of the image batch for which to generate masks. Should be (batch_size, height, width).
batch_mask_ratio – Percentage of images per batch for which to generate block masks. The remaining images are not masked.
min_image_mask_ratio – Minimum percentage of the image to mask. In practice, fewer than min_image_mask_ratio patches of the image can be masked due to additional constraints.
max_image_mask_ratio – Maximum percentage of the image to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask for an image.
device – Device on which to create the mask.
- Returns
A boolean tensor with shape (batch_size, height, width) where each entry is True if the patch should be masked and False otherwise.
- Raises
ValueError – If ‘max_image_mask_ratio’ is less than ‘min_image_mask_ratio’.
- lightly.models.utils.random_block_mask_image(size: Tuple[int, int], num_masks: int, min_num_masks_per_block: int = 4, max_num_masks_per_block: Optional[int] = None, min_block_aspect_ratio: float = 0.3, max_block_aspect_ratio: Optional[float] = None, max_attempts_per_block: int = 10, device: Optional[Union[device, str]] = None) Tensor
Creates a random block mask for a single image.
- Parameters
size – Size of the image for which to generate a mask. Should be (height, width).
num_masks – Number of patches to mask.
min_num_masks_per_block – Minimum number of patches to mask per block.
max_num_masks_per_block – Maximum number of patches to mask per block.
min_block_aspect_ratio – Minimum aspect ratio (height/width) of a masked block.
max_block_aspect_ratio – Maximum aspect ratio (height/width) of a masked block.
max_attempts_per_block – Maximum number of attempts to find a valid block mask.
device – Device on which to create the mask.
- Returns
A boolean tensor with shape (height, width) where each entry is True if the patch should be masked and False otherwise.
- Raises
ValueError – If ‘max_num_masks_per_block’ is less than ‘min_num_masks_per_block’ or if ‘max_block_aspect_ratio’ is less than ‘min_block_aspect_ratio’
- lightly.models.utils.random_prefix_mask(size: Tuple[int, int], max_prefix_length: int, device: Optional[Union[device, str]] = None) Tensor
Creates a random prefix mask.
The mask is created by uniformly sampling a prefix length in [0, max_prefix_length] for each sequence in the batch. All tokens with an index greater or equal to the prefix length are masked.
- Parameters
size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
max_prefix_length – Maximum length of the prefix to mask.
device – Device on which to create the mask.
- Returns
A mask tensor with shape (batch_size, sequence_length) where each entry is True if the token should be masked and False otherwise.
- lightly.models.utils.random_token_mask(size: Tuple[int, int], mask_ratio: float = 0.6, mask_class_token: bool = False, device: Optional[Union[device, str]] = None) Tuple[Tensor, Tensor]
Creates random token masks.
- Parameters
size – Size of the token batch for which to generate masks. Should be (batch_size, sequence_length).
mask_ratio – Proportion of tokens to mask.
mask_class_token – If False the class token is never masked. If True the class token might be masked.
device – Device on which to create the index masks.
- Returns
A (index_keep, index_mask) tuple where each index is a tensor. index_keep contains the indices of the unmasked tokens and has shape (batch_size, num_keep). index_mask contains the indices of the masked tokens and has shape (batch_size, sequence_length - num_keep). num_keep is equal to sequence_length * (1 - mask_ratio).
- lightly.models.utils.repeat_interleave_batch(x: Tensor, B: int, repeat: int) Tensor
Repeat and interleave the input tensor.
- Parameters
x – Tensor with shape (B * N, …) where B is the batch size and N the number of batches.
B – Batch size.
repeat – Number of times to repeat each batch.
- Returns
Tensor with shape (B * repeat * N, …) where each batch is repeated repeat times.
- lightly.models.utils.repeat_token(token: Tensor, size: Tuple[int, int]) Tensor
Repeats a token size times.
- Parameters
token – Token tensor with shape (1, 1, dim).
size – (batch_size, sequence_length) tuple.
- Returns
Tensor with shape (batch_size, sequence_length, dim) containing copies of the input token.
- lightly.models.utils.select_most_similar(x: Tensor, y: Tensor, y_values: Tensor) Tensor
For each feature in x, searches the most similar feature in y and returns the corresponding value from y_values.
- Parameters
x – Tensor with shape (B, N, C).
y – Tensor with shape (B, N, C).
y_values – Tensor with shape (B, N, D).
- Returns
Values with shape (B, N, D) where values[i, j] is the entry in y_values[i, …] such that x[i, j] is the most similar to y[i, …].
- lightly.models.utils.set_at_index(tokens: Tensor, index: Tensor, value: Tensor) Tensor
Copies all values into the input tensor at the given indices.
- Parameters
tokens – Tokens tensor with shape (batch_size, sequence_length, dim).
index – Index tensor with shape (batch_size, index_length).
value – Value tensor with shape (batch_size, index_length, dim).
- Returns
Tokens tensor with shape (batch_size, sequence_length, dim) containing the new values.
- lightly.models.utils.unpatchify(patches: Tensor, patch_size: int, channels: int = 3) Tensor
Reconstructs images from their patches.
- Args:
- patches:
Patches tensor with shape (batch_size, num_patches, channels * patch_size ** 2).
- patch_size:
The patch size in pixels used to create the patches.
- channels:
The number of channels the image must have
- Returns:
Reconstructed images tensor with shape (batch_size, channels, height, width).
- lightly.models.utils.update_drop_path_rate(model: VisionTransformer, drop_path_rate: float, mode: str = 'linear') None
Updates the drop path rate in a TIMM VisionTransformer model.
- Parameters
model – TIMM VisionTransformer model.
drop_path_rate – Maximum drop path rate.
mode – Drop path rate update mode. Can be “linear” or “uniform”. Linear increases the drop path rate from 0 to drop_path_rate over the depth of the model. Uniform sets the drop path rate to drop_path_rate for all blocks.
- Raises
ValueError – If an unknown mode is provided.
- lightly.models.utils.update_momentum(model: Module, model_ema: Module, m: float)
Updates parameters of model_ema with Exponential Moving Average of model
Momentum encoders are a crucial component for models such as MoCo or BYOL.
- Parameters
model – The current model.
model_ema – The model with exponential moving average (EMA) parameters.
m – The momentum factor, between 0 and 1.
Examples
>>> backbone = resnet18() >>> projection_head = MoCoProjectionHead() >>> backbone_momentum = copy.deepcopy(moco) >>> projection_head_momentum = copy.deepcopy(projection_head) >>> >>> # update momentum >>> update_momentum(moco, moco_momentum, m=0.999) >>> update_momentum(projection_head, projection_head_momentum, m=0.999)