lightly.transforms

The lightly.transforms package transforms for various self-supervised learning methods.

It also contains some additional transforms that are not part of torchvisions transforms.

class lightly.transforms.densecl_transform.DenseCLTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for DenseCL [0].

Identical to MoCoV2Transform.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

[0]: 2021, DenseCL: https://arxiv.org/abs/2011.09157

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.dino_transform.DINOTransform(global_crop_size: int = 224, global_crop_scale: Tuple[float, float] = (0.4, 1.0), local_crop_size: int = 96, local_crop_scale: Tuple[float, float] = (0.05, 0.4), n_local_views: int = 6, hf_prob: float = 0.5, vf_prob: float = 0, rr_prob: float = 0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: Tuple[float, float, float] = (1.0, 0.1, 0.5), kernel_size: Optional[float] = None, kernel_scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), solarization_prob: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the global and local view augmentations for DINO [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 * global + n_local_views. (8 by default)

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization

This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].

[0]: DINO, 2021, https://arxiv.org/abs/2104.14294
[1]: https://github.com/facebookresearch/dino

global_crop_size: Crop size of the global views.

global_crop_scale: Tuple of min and max scales relative to global_crop_size.

local_crop_size: Crop size of the local views.

local_crop_scale: Tuple of min and max scales relative to local_crop_size.

n_local_views: Number of generated local views.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

kernel_scale: Old argument. Value is deprecated in favor of sigmas. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

solarization: Probability to apply solarization on the second global view.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.fast_siam_transform.FastSiamTransform(num_views: int = 4, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for FastSiam.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 4.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

num_views: Number of views (num_views = K+1 where K is the number of target views).

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.gaussian_blur.GaussianBlur(kernel_size: Optional[float] = None, prob: float = 0.5, scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2))

Implementation of random Gaussian blur.

Utilizes the built-in ImageFilter method from PIL to apply a Gaussian blur to the input image with a certain probability. The blur is further randomized by sampling uniformly the values of the standard deviation of the Gaussian kernel.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

prob: Probability with which the blur is applied.

scale: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

__call__(sample: Union[Tensor, Image]) → Union[Tensor, Image]

Blurs the image with a given probability.

Parameters: sample – PIL image to which blur will be applied.
Returns: Blurred image or original image.

class lightly.transforms.image_grid_transform.ImageGridTransform(transforms: Sequence[Compose])

Transforms an image into multiple views and grids.

Used for VICRegL.

transforms: A sequence of (image_grid_transform, view_transform) tuples. The image_grid_transform creates a new view and grid from the image. The view_transform further augments the view. Every transform tuple is applied once to the image, creating len(transforms) views and grids.

__call__(image: Union[Tensor, Image]) → Union[List[Tensor], List[Image]]

Transforms an image into multiple views.

Every transform in self.transforms creates a new view.

Parameters

image – Image to be transformed into multiple views and grids.

Returns

List of views and grids tensors or PIL images. In the VICRegL implementation it has size:

[
    [3, global_crop_size, global_crop_size],
    [3, local_crop_size, local_crop_size],
    [global_grid_size, global_grid_size, 2],
    [local_grid_size, local_grid_size, 2]
]

class lightly.transforms.jigsaw.Jigsaw(n_grid: int = 3, img_size: int = 255, crop_size: int = 64, transform: Compose = ToTensor(ToTensor()))

Implementation of Jigsaw image augmentation, inspired from PyContrast library.

Generates n_grid**2 random crops and returns a list.

This augmentation is instrumental to PIRL.

n_grid: Side length of the meshgrid, sqrt of the number of crops.

img_size: Size of image.

crop_size: Size of crops.

transform: Transformation to apply on each crop.

Examples

>>> from lightly.transforms import Jigsaw
>>>
>>> jigsaw_crop = Jigsaw(n_grid=3, img_size=255, crop_size=64, transform=T.ToTensor())
>>>
>>> # img is a PIL image
>>> crops = jigsaw_crops(img)

__call__(img: Image) → Tensor

Performs the Jigsaw augmentation :Parameters: img – PIL image to perform Jigsaw augmentation on.

Returns: Torch tensor with stacked crops.

class lightly.transforms.mae_transform.MAETransform(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the view augmentation for MAE [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 1.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip

[0]: Masked Autoencoder, 2021, https://arxiv.org/abs/2111.06377

input_size: Size of the input image in pixels.

min_scale: Minimum size of the randomized crop relative to the input_size.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

__call__(image: Union[Tensor, Image]) → List[Tensor]

Applies the transforms to the input image.

Parameters: image – The input image to apply the transforms to.
Returns: The transformed image.

class lightly.transforms.mmcr_transform.MMCRTransform(k: int = 8, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.2, cj_hue: float = 0.1, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 1.0, solarization_prob: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MMCR[0], which are based on BYOL[1].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length k.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Solarization
ImageNet normalization

Please refer to the BYOL implementation for additional details.

[0]: Efficient Coding of Natural Images using Maximum Manifold Capacity
Representations, 2023, https://arxiv.org/pdf/2303.03307.pdf
[1]: Bootstrap Your Own Latent, 2020, https://arxiv.org/pdf/2006.07733.pdf

Input to this transform:: PIL Image or Tensor.
Output of this transform:: List of tensors of length k.

k: Number of views.

transform: The transform to apply to each view.

class lightly.transforms.moco_transform.MoCoV1Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v1.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
ImageNet normalization

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.moco_transform.MoCoV2Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v2 [0].

Similar to SimCLRTransform, but with different values for color jittering and minimum scale of the random resized crop.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

[0]: MoCo v2, 2020, https://arxiv.org/abs/2003.04297

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.msn_transform.MSNTransform(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MSN [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 * random_views + focal_views. (12 by default)

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].

[0]: Masked Siamese Networks, 2022: https://arxiv.org/abs/2204.07141

random_size: Size of the random image views in pixels.

focal_size: Size of the focal image views in pixels.

random_views: Number of random views to generate.

focal_views: Number of focal views to generate.

random_crop_scale: Minimum and maximum size of the randomized crops for the relative to random_size.

focal_crop_scale: Minimum and maximum size of the randomized crops relative to focal_size.

cj_prob: Probability that color jittering is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

random_gray_scale: Probability of conversion to grayscale.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.multi_crop_transform.MultiCropTranform(crop_sizes: Tuple[int, ...], crop_counts: Tuple[int, ...], crop_min_scales: Tuple[float, ...], crop_max_scales: Tuple[float, ...], transforms: Compose)

Implements the multi-crop transformations. Used by Swav.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length crop_counts.

Applies the following augmentations by default:

Random resized crop
transforms passed by constructor

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

transforms: Transforms which are applied to all crops.

class lightly.transforms.multi_view_transform.MultiViewTransform(transforms: Sequence[Compose])

Transforms an image into multiple views.

Parameters: transforms – A sequence of transforms. Every transform creates a new view.

__call__(image: Union[Tensor, Image]) → Union[List[Tensor], List[Image]]

Transforms an image into multiple views.

Every transform in self.transforms creates a new view.

Parameters: image – Image to be transformed into multiple views.
Returns: List of views.

class lightly.transforms.multi_view_transform_v2.MultiViewTransformV2(transforms: Sequence[Compose])

Transforms an image into multiple views and is compatible with transforms v2.

Parameters: transforms – A sequence of v2 transforms. Every transform creates a new view.

__call__(*args: Any) → List[Any]

Transforms a data structure containing images, bounding boxes and masks into a sequence of multiple views.

Every transform in self.transforms creates a new view.

Parameters: *args – Arbitary positional arguments consisting of arbitrary data structures containing images, bounding boxes and masks.
Returns: A list of views, where each view is a transformed version of *args.

class lightly.transforms.pirl_transform.PIRLTransform(input_size: Union[int, Tuple[int, int]] = 64, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 (original, augmented).

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Jigsaw puzzle

[0] PIRL, 2019: https://arxiv.org/abs/1912.01991

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

hf_prob: Probability that horizontal flip is applied.

n_grid: Sqrt of the number of grids in the jigsaw image.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.random_crop_and_flip_with_grid.Location(top: float, left: float, height: float, width: float, image_height: float, image_width: float, horizontal_flip: bool = False, vertical_flip: bool = False)

class lightly.transforms.random_crop_and_flip_with_grid.RandomHorizontalFlipWithLocation(p=0.5)

See base class.

forward(img: Image, location: Location) → Tuple[Image, Location]

Horizontal flip image.

Horizontally flip the given image randomly with a given probability and return both the resulting image and the location.

Parameters

img (PIL Image or Tensor) – Image to be flipped..
Location – Location object linked to the image

Returns

PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.horizontal_flip parameter

class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropAndFlip(grid_size: int = 7, crop_size: int = 224, crop_min_scale: float = 0.05, crop_max_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.5)

Randomly flip and crop an image.

A PyTorch module that applies random cropping, horizontal and vertical flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.

Parameters

grid_size – The number of grid cells in the output grid tensor.
crop_size – The size (in pixels) of the random crops.
crop_min_scale – The minimum scale factor for random resized crops.
crop_max_scale – The maximum scale factor for random resized crops.
hf_prob – The probability of applying horizontal flipping to the image.
normalize – A dictionary containing the mean and std values for normalizing the image.

forward(img: Image) → Tuple[Image, Tensor]

Applies random cropping and horizontal flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.

Parameters: img – The input PIL image.
Returns: A tuple containing the transformed PIL image and the grid tensor.

location_to_NxN_grid(location: Location) → Tensor

Create grid from location object.

Create a grid tensor with grid_size rows and grid_size columns, where each cell represents a region of the original image. The grid is used to map the cropped and transformed image back to the original image space.

Parameters: location – An instance of the Location class, containing the location and size of the transformed image in the original image space.
Returns: A grid tensor of shape (grid_size, grid_size, 2), where the last dimension represents the (x, y) coordinate of the center of each cell in the original image space.

class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropWithLocation(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR)

Do a random resized crop and return both the resulting image and the location. See base class.

forward(img: Image) → Tuple[Image, Location]

Parameters: img (PIL Image or Tensor) – Image to be cropped.
Returns: PIL Image or Tensor – Randomly cropped image Location: Location object containing crop parameters

class lightly.transforms.random_crop_and_flip_with_grid.RandomVerticalFlipWithLocation(p=0.5)

See base class.

forward(img: Image, location: Location) → Tuple[Image, Location]

Vertical flip image.

Vertically flip the given image randomly with a given probability and return both the resulting image and the location.

Parameters

img (PIL Image or Tensor) – Image to be flipped..
Location – Location object linked to the image

Returns

PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.vertical_flip parameter

class lightly.transforms.rotation.RandomRotate(prob: float = 0.5, angle: int = 90)

Implementation of random rotation.

Randomly rotates an input image by a fixed angle. By default, we rotate the image by 90 degrees with a probability of 50%.

This augmentation can be very useful for rotation invariant images such as in medical imaging or satellite imaginary.

prob: Probability with which image is rotated.

angle: Angle by which the image is rotated. We recommend multiples of 90 to prevent rasterization artifacts. If you pick numbers like 90, 180, 270 the tensor will be rotated without introducing any artifacts.

__call__(image: Union[Image, Tensor]) → Union[Image, Tensor]

Rotates the image with a given probability.

Parameters: image – PIL image or tensor which will be rotated.
Returns: Rotated image or original image.

class lightly.transforms.rotation.RandomRotateDegrees(prob: float, degrees: Union[float, Tuple[float, float]])

Random rotate image between two rotation angles with a random probability.

prob: Probability with which image is rotated.

degrees: Range of degrees to select from. If degrees is a number instead of a sequence like (min, max), the range of degrees will be (-degrees, +degrees). The image is rotated counter-clockwise with a random angle in the (min, max) range or in the (-degrees, +degrees) range.

__call__(image: Union[Image, Tensor]) → Union[Image, Tensor]

Rotates the images with a given probability.

Parameters: image – PIL image or tensor which will be rotated.
Returns: Rotated image or original image.

class lightly.transforms.simclr_transform.SimCLRTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimCLR [0, 1].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

Note that SimCLR v1 and v2 use the same data augmentations.

[0]: SimCLR v1, 2020, https://arxiv.org/abs/2002.05709
[1]: SimCLR v2, 2020, https://arxiv.org/abs/2006.10029

Input to this transform:: PIL Image or Tensor.
Output of this transform:: List of [tensor, tensor].

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.simsiam_transform.SimSiamTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimSiam.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.smog_transform.SMoGTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (4, 4), crop_min_scales: Tuple[float, float] = (0.2, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.2), gaussian_blur_probs: Tuple[float, float] = (0.5, 0.1), gaussian_blur_kernel_sizes: Tuple[Optional[float], Optional[float]] = (None, None), gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), solarize_probs: Tuple[float, float] = (0.0, 0.2), hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SMoG.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length sum(crop_counts). (8 by default)

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

gaussian_blur_probs: Probability of Gaussian blur for each crop category.

gaussian_blur_kernel_sizes: Deprecated values in favour of sigmas.

gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled.

solarize_probs: Probability of solarization for each crop category.

hf_prob: Probability that horizontal flip is applied.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

random_gray_scale: Probability of conversion to grayscale.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.solarize.RandomSolarization(prob: float = 0.5, threshold: int = 128)

Implementation of random image Solarization.

Utilizes the integrated image operation solarize from Pillow. Solarization inverts all pixel values above a threshold (default: 128).

probability: Probability to apply the transformation

threshold: Threshold for solarization.

__call__(sample: Image) → Image

Solarizes the given input image

Parameters: sample – PIL image to which solarize will be applied.
Returns: Solarized image or original image.

class lightly.transforms.swav_transform.SwaVTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (2, 6), crop_min_scales: Tuple[float, float] = (0.14, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.14), hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the multi-crop transformations for SwaV.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length sum(crop_counts). (8 by default)

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.vicreg_transform.VICRegTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, solarize_prob: float = 0.1, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for VICReg.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Random solarization
Gaussian blur
ImageNet normalization

Similar to SimCLR transform but with extra solarization.

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

solarize_prob: Probability of solarization.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.vicregl_transform.VICRegLTransform(global_crop_size: int = 224, local_crop_size: int = 96, n_global_views: int = 2, n_local_views: int = 6, global_crop_scale: Tuple[float, float] = (0.2, 1.0), local_crop_scale: Tuple[float, float] = (0.05, 0.2), global_grid_size: int = 7, local_grid_size: int = 3, global_gaussian_blur_prob: float = 0.5, local_gaussian_blur_prob: float = 0.1, global_gaussian_blur_kernel_size: Optional[float] = None, local_gaussian_blur_kernel_size: Optional[float] = None, global_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), local_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), global_solarize_prob: float = 0.0, local_solarize_prob: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Transforms images for VICRegL.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length n_global_views + n_local_views. (8 by default)

Applies the following augmentations by default:

Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization

[0]: VICRegL, 2022, https://arxiv.org/abs/2210.01571

global_crop_size: Size of the input image in pixels for the global crop views.

local_crop_size: Size of the input image in pixels for the local crop views.

n_global_views: Number of global crop views to generate.

n_local_views: Number of local crop views to generate. For ResNet backbones it is recommended to set this to 0, see [0].

global_crop_scale: Min and max scales for the global crop views.

local_crop_scale: Min and max scales for the local crop views.

global_grid_size: Grid size for the global crop views.

local_grid_size: Grid size for the local crop views.

global_gaussian_blur_prob: Probability of Gaussian blur for the global crop views.

local_gaussian_blur_prob: Probability of Gaussian blur for the local crop views.

global_gaussian_blur_kernel_size: Will be deprecated in favor of global_gaussian_blur_sigmas argument. If set, the old behavior applies and global_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with global_gaussian_blur_kernel_size * input_size. Applied to global crop views.

local_gaussian_blur_kernel_size: Will be deprecated in favor of local_gaussian_blur_sigmas argument. If set, the old behavior applies and local_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with local_gaussian_blur_kernel_size * input_size. Applied to local crop views.

global_gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if global_gaussian_blur_kernel_size is set. Applied to global crop views.

local_gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if local_gaussian_blur_kernel_size is set. Applied to local crop views.

global_solarize_prob: Probability of solarization for the global crop views.

local_solarize_prob: Probability of solarization for the local crop views.

hf_prob: Probability that horizontal flip is applied.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

random_gray_scale: Probability of conversion to grayscale.

normalize: Dictionary with mean and standard deviation for normalization.