lightly.transforms
The lightly.transforms package transforms for various self-supervised learning methods.
It also contains some additional transforms that are not part of torchvisions transforms.
- class lightly.transforms.densecl_transform.DenseCLTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for DenseCL [0].
Identical to MoCoV2Transform.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
[0]: 2021, DenseCL: https://arxiv.org/abs/2011.09157
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.dino_transform.DINOTransform(global_crop_size: int = 224, global_crop_scale: Tuple[float, float] = (0.4, 1.0), local_crop_size: int = 96, local_crop_scale: Tuple[float, float] = (0.05, 0.4), n_local_views: int = 6, hf_prob: float = 0.5, vf_prob: float = 0, rr_prob: float = 0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: Tuple[float, float, float] = (1.0, 0.1, 0.5), kernel_size: Optional[float] = None, kernel_scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), solarization_prob: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the global and local view augmentations for DINO [0].
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2 * global + n_local_views. (8 by default)
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization
This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].
[0]: DINO, 2021, https://arxiv.org/abs/2104.14294
- global_crop_size
Crop size of the global views.
- global_crop_scale
Tuple of min and max scales relative to global_crop_size.
- local_crop_size
Crop size of the local views.
- local_crop_scale
Tuple of min and max scales relative to local_crop_size.
- n_local_views
Number of generated local views.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- kernel_scale
Old argument. Value is deprecated in favor of sigmas. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- solarization
Probability to apply solarization on the second global view.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.fast_siam_transform.FastSiamTransform(num_views: int = 4, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for FastSiam.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 4.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
- num_views
Number of views (num_views = K+1 where K is the number of target views).
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.gaussian_blur.GaussianBlur(kernel_size: Optional[float] = None, prob: float = 0.5, scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2))
Implementation of random Gaussian blur.
Utilizes the built-in ImageFilter method from PIL to apply a Gaussian blur to the input image with a certain probability. The blur is further randomized by sampling uniformly the values of the standard deviation of the Gaussian kernel.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- prob
Probability with which the blur is applied.
- scale
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- __call__(sample: Image) Image
Blurs the image with a given probability.
- Parameters
sample – PIL image to which blur will be applied.
- Returns
Blurred image or original image.
- class lightly.transforms.image_grid_transform.ImageGridTransform(transforms: Sequence[Compose])
Transforms an image into multiple views and grids.
Used for VICRegL.
- transforms
A sequence of (image_grid_transform, view_transform) tuples. The image_grid_transform creates a new view and grid from the image. The view_transform further augments the view. Every transform tuple is applied once to the image, creating len(transforms) views and grids.
- __call__(image: Union[Tensor, Image]) Union[List[Tensor], List[Image]]
Transforms an image into multiple views.
Every transform in self.transforms creates a new view.
- Parameters
image – Image to be transformed into multiple views and grids.
- Returns
List of views and grids tensors or PIL images. In the VICRegL implementation it has size: [
[3, global_crop_size, global_crop_size], [3, local_crop_size, local_crop_size], [global_grid_size, global_grid_size, 2], [local_grid_size, local_grid_size, 2]
]
- class lightly.transforms.jigsaw.Jigsaw(n_grid: int = 3, img_size: int = 255, crop_size: int = 64, transform: Compose = ToTensor())
Implementation of Jigsaw image augmentation, inspired from PyContrast library.
Generates n_grid**2 random crops and returns a list.
This augmentation is instrumental to PIRL.
- n_grid
Side length of the meshgrid, sqrt of the number of crops.
- img_size
Size of image.
- crop_size
Size of crops.
- transform
Transformation to apply on each crop.
Examples
>>> from lightly.transforms import Jigsaw >>> >>> jigsaw_crop = Jigsaw(n_grid=3, img_size=255, crop_size=64, transform=transforms.ToTensor()) >>> >>> # img is a PIL image >>> crops = jigsaw_crops(img)
- __call__(img: Image) Tensor
Performs the Jigsaw augmentation :Parameters: img – PIL image to perform Jigsaw augmentation on.
- Returns
Torch tensor with stacked crops.
- class lightly.transforms.mae_transform.MAETransform(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the view augmentation for MAE [0].
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 1.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
[0]: Masked Autoencoder, 2021, https://arxiv.org/abs/2111.06377
- input_size
Size of the input image in pixels.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- __call__(image: Union[Tensor, Image]) List[Tensor]
Applies the transforms to the input image.
- Parameters
image – The input image to apply the transforms to.
- Returns
The transformed image.
- class lightly.transforms.mmcr_transform.MMCRTransform(k: int = 8, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.2, cj_hue: float = 0.1, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 1.0, solarization_prob: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MMCR[0], which are based on BYOL[1].
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length k.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Solarization
ImageNet normalization
Please refer to the BYOL implementation for additional details.
- [0]: Efficient Coding of Natural Images using Maximum Manifold Capacity
Representations, 2023, https://arxiv.org/pdf/2303.03307.pdf
[1]: Bootstrap Your Own Latent, 2020, https://arxiv.org/pdf/2006.07733.pdf
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of tensors of length k.
- k
Number of views.
- transform
The transform to apply to each view.
- class lightly.transforms.moco_transform.MoCoV1Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MoCo v1.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
ImageNet normalization
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.moco_transform.MoCoV2Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MoCo v2 [0].
Similar to SimCLRTransform, but with different values for color jittering and minimum scale of the random resized crop.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
[0]: MoCo v2, 2020, https://arxiv.org/abs/2003.04297
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.msn_transform.MSNTransform(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MSN [0].
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2 * random_views + focal_views. (12 by default)
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].
[0]: Masked Siamese Networks, 2022: https://arxiv.org/abs/2204.07141
- random_size
Size of the random image views in pixels.
- focal_size
Size of the focal image views in pixels.
- random_views
Number of random views to generate.
- focal_views
Number of focal views to generate.
- random_crop_scale
Minimum and maximum size of the randomized crops for the relative to random_size.
- focal_crop_scale
Minimum and maximum size of the randomized crops relative to focal_size.
- cj_prob
Probability that color jittering is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- random_gray_scale
Probability of conversion to grayscale.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.multi_crop_transform.MultiCropTranform(crop_sizes: Tuple[int, ...], crop_counts: Tuple[int, ...], crop_min_scales: Tuple[float, ...], crop_max_scales: Tuple[float, ...], transforms: Compose)
Implements the multi-crop transformations. Used by Swav.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length crop_counts.
- Applies the following augmentations by default:
Random resized crop
transforms passed by constructor
- crop_sizes
Size of the input image in pixels for each crop category.
- crop_counts
Number of crops for each crop category.
- crop_min_scales
Min scales for each crop category.
- crop_max_scales
Max_scales for each crop category.
- transforms
Transforms which are applied to all crops.
- class lightly.transforms.multi_view_transform.MultiViewTransform(transforms: Sequence[Compose])
Transforms an image into multiple views.
- Parameters
transforms – A sequence of transforms. Every transform creates a new view.
- __call__(image: Union[Tensor, Image]) Union[List[Tensor], List[Image]]
Transforms an image into multiple views.
Every transform in self.transforms creates a new view.
- Parameters
image – Image to be transformed into multiple views.
- Returns
List of views.
- class lightly.transforms.pirl_transform.PIRLTransform(input_size: Union[int, Tuple[int, int]] = 64, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2 (original, augmented).
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Jigsaw puzzle
[0] PIRL, 2019: https://arxiv.org/abs/1912.01991
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- hf_prob
Probability that horizontal flip is applied.
- n_grid
Sqrt of the number of grids in the jigsaw image.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.random_crop_and_flip_with_grid.Location(top: float, left: float, height: float, width: float, image_height: float, image_width: float, horizontal_flip: bool = False, vertical_flip: bool = False)
- class lightly.transforms.random_crop_and_flip_with_grid.RandomHorizontalFlipWithLocation(p=0.5)
See base class.
- forward(img: Image, location: Location) Tuple[Image, Location]
Horizontal flip image.
Horizontally flip the given image randomly with a given probability and return both the resulting image and the location.
- Parameters
img (PIL Image or Tensor) – Image to be flipped..
Location – Location object linked to the image
- Returns
PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.horizontal_flip parameter
- class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropAndFlip(grid_size: int = 7, crop_size: int = 224, crop_min_scale: float = 0.05, crop_max_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.5)
Randomly flip and crop an image.
A PyTorch module that applies random cropping, horizontal and vertical flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.
- Parameters
grid_size – The number of grid cells in the output grid tensor.
crop_size – The size (in pixels) of the random crops.
crop_min_scale – The minimum scale factor for random resized crops.
crop_max_scale – The maximum scale factor for random resized crops.
hf_prob – The probability of applying horizontal flipping to the image.
normalize – A dictionary containing the mean and std values for normalizing the image.
- forward(img: Image) Tuple[Image, Tensor]
Applies random cropping and horizontal flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.
- Parameters
img – The input PIL image.
- Returns
A tuple containing the transformed PIL image and the grid tensor.
- location_to_NxN_grid(location: Location) Tensor
Create grid from location object.
Create a grid tensor with grid_size rows and grid_size columns, where each cell represents a region of the original image. The grid is used to map the cropped and transformed image back to the original image space.
- Parameters
location – An instance of the Location class, containing the location and size of the transformed image in the original image space.
- Returns
A grid tensor of shape (grid_size, grid_size, 2), where the last dimension represents the (x, y) coordinate of the center of each cell in the original image space.
- class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropWithLocation(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR)
Do a random resized crop and return both the resulting image and the location. See base class.
- class lightly.transforms.random_crop_and_flip_with_grid.RandomVerticalFlipWithLocation(p=0.5)
See base class.
- forward(img: Image, location: Location) Tuple[Image, Location]
Vertical flip image.
Vertically flip the given image randomly with a given probability and return both the resulting image and the location.
- Parameters
img (PIL Image or Tensor) – Image to be flipped..
Location – Location object linked to the image
- Returns
PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.vertical_flip parameter
- class lightly.transforms.rotation.RandomRotate(prob: float = 0.5, angle: int = 90)
Implementation of random rotation.
Randomly rotates an input image by a fixed angle. By default, we rotate the image by 90 degrees with a probability of 50%.
This augmentation can be very useful for rotation invariant images such as in medical imaging or satellite imaginary.
- prob
Probability with which image is rotated.
- angle
Angle by which the image is rotated. We recommend multiples of 90 to prevent rasterization artifacts. If you pick numbers like 90, 180, 270 the tensor will be rotated without introducing any artifacts.
- __call__(image: Union[Image, Tensor]) Union[Image, Tensor]
Rotates the image with a given probability.
- Parameters
image – PIL image or tensor which will be rotated.
- Returns
Rotated image or original image.
- class lightly.transforms.rotation.RandomRotateDegrees(prob: float, degrees: Union[float, Tuple[float, float]])
Random rotate image between two rotation angles with a random probability.
- prob
Probability with which image is rotated.
- degrees
Range of degrees to select from. If degrees is a number instead of a sequence like (min, max), the range of degrees will be (-degrees, +degrees). The image is rotated counter-clockwise with a random angle in the (min, max) range or in the (-degrees, +degrees) range.
- __call__(image: Union[Image, Tensor]) Union[Image, Tensor]
Rotates the images with a given probability.
- Parameters
image – PIL image or tensor which will be rotated.
- Returns
Rotated image or original image.
- class lightly.transforms.simclr_transform.SimCLRTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for SimCLR [0, 1].
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
Note that SimCLR v1 and v2 use the same data augmentations.
[0]: SimCLR v1, 2020, https://arxiv.org/abs/2002.05709
[1]: SimCLR v2, 2020, https://arxiv.org/abs/2006.10029
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of [tensor, tensor].
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.simsiam_transform.SimSiamTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for SimSiam.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.smog_transform.SMoGTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (4, 4), crop_min_scales: Tuple[float, float] = (0.2, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.2), gaussian_blur_probs: Tuple[float, float] = (0.5, 0.1), gaussian_blur_kernel_sizes: Tuple[Optional[float], Optional[float]] = (None, None), gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), solarize_probs: Tuple[float, float] = (0.0, 0.2), hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for SMoG.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length sum(crop_counts). (8 by default)
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization
- crop_sizes
Size of the input image in pixels for each crop category.
- crop_counts
Number of crops for each crop category.
- crop_min_scales
Min scales for each crop category.
- crop_max_scales
Max_scales for each crop category.
- gaussian_blur_probs
Probability of Gaussian blur for each crop category.
- gaussian_blur_kernel_sizes
Deprecated values in favour of sigmas.
- gaussian_blur_sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled.
- solarize_probs
Probability of solarization for each crop category.
- hf_prob
Probability that horizontal flip is applied.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- random_gray_scale
Probability of conversion to grayscale.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.solarize.RandomSolarization(prob: float = 0.5, threshold: int = 128)
Implementation of random image Solarization.
Utilizes the integrated image operation solarize from Pillow. Solarization inverts all pixel values above a threshold (default: 128).
- probability
Probability to apply the transformation
- threshold
Threshold for solarization.
- __call__(sample: Image) Image
Solarizes the given input image
- Parameters
sample – PIL image to which solarize will be applied.
- Returns
Solarized image or original image.
- class lightly.transforms.swav_transform.SwaVTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (2, 6), crop_min_scales: Tuple[float, float] = (0.14, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.14), hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the multi-crop transformations for SwaV.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length sum(crop_counts). (8 by default)
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
ImageNet normalization
- crop_sizes
Size of the input image in pixels for each crop category.
- crop_counts
Number of crops for each crop category.
- crop_min_scales
Min scales for each crop category.
- crop_max_scales
Max_scales for each crop category.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.vicreg_transform.VICRegTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, solarize_prob: float = 0.1, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for VICReg.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length 2.
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Random solarization
Gaussian blur
ImageNet normalization
Similar to SimCLR transform but with extra solarization.
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- solarize_prob
Probability of solarization.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.
- sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random rotation is applied.
- rr_degrees
Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.transforms.vicregl_transform.VICRegLTransform(global_crop_size: int = 224, local_crop_size: int = 96, n_global_views: int = 2, n_local_views: int = 6, global_crop_scale: Tuple[float, float] = (0.2, 1.0), local_crop_scale: Tuple[float, float] = (0.05, 0.2), global_grid_size: int = 7, local_grid_size: int = 3, global_gaussian_blur_prob: float = 0.5, local_gaussian_blur_prob: float = 0.1, global_gaussian_blur_kernel_size: Optional[float] = None, local_gaussian_blur_kernel_size: Optional[float] = None, global_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), local_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), global_solarize_prob: float = 0.0, local_solarize_prob: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Transforms images for VICRegL.
- Input to this transform:
PIL Image or Tensor.
- Output of this transform:
List of Tensor of length n_global_views + n_local_views. (8 by default)
- Applies the following augmentations by default:
Random resized crop
Random horizontal flip
Color jitter
Random gray scale
Gaussian blur
Random solarization
ImageNet normalization
[0]: VICRegL, 2022, https://arxiv.org/abs/2210.01571
- global_crop_size
Size of the input image in pixels for the global crop views.
- local_crop_size
Size of the input image in pixels for the local crop views.
- n_global_views
Number of global crop views to generate.
- n_local_views
Number of local crop views to generate. For ResNet backbones it is recommended to set this to 0, see [0].
- global_crop_scale
Min and max scales for the global crop views.
- local_crop_scale
Min and max scales for the local crop views.
- global_grid_size
Grid size for the global crop views.
- local_grid_size
Grid size for the local crop views.
- global_gaussian_blur_prob
Probability of Gaussian blur for the global crop views.
- local_gaussian_blur_prob
Probability of Gaussian blur for the local crop views.
- global_gaussian_blur_kernel_size
Will be deprecated in favor of global_gaussian_blur_sigmas argument. If set, the old behavior applies and global_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with global_gaussian_blur_kernel_size * input_size. Applied to global crop views.
- local_gaussian_blur_kernel_size
Will be deprecated in favor of local_gaussian_blur_sigmas argument. If set, the old behavior applies and local_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with local_gaussian_blur_kernel_size * input_size. Applied to local crop views.
- global_gaussian_blur_sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if global_gaussian_blur_kernel_size is set. Applied to global crop views.
- local_gaussian_blur_sigmas
Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if local_gaussian_blur_kernel_size is set. Applied to local crop views.
- global_solarize_prob
Probability of solarization for the global crop views.
- local_solarize_prob
Probability of solarization for the local crop views.
- hf_prob
Probability that horizontal flip is applied.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- random_gray_scale
Probability of conversion to grayscale.
- normalize
Dictionary with mean and standard deviation for normalization.