lightly.transforms

The lightly.transforms package transforms for various self-supervised learning methods.

It also contains some additional transforms that are not part of torchvisions transforms.

class lightly.transforms.dino_transform.DINOTransform(global_crop_size: int = 224, global_crop_scale: Tuple[float, float] = (0.4, 1.0), local_crop_size: int = 96, local_crop_scale: Tuple[float, float] = (0.05, 0.4), n_local_views: int = 6, hf_prob: float = 0.5, vf_prob: float = 0, rr_prob: float = 0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: Tuple[float, float, float] = (1.0, 0.1, 0.5), kernel_size: Optional[float] = None, kernel_scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), solarization_prob: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the global and local view augmentations for DINO [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 * global + n_local_views. (8 by default)

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • Random solarization

  • ImageNet normalization

This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].

global_crop_size

Crop size of the global views.

global_crop_scale

Tuple of min and max scales relative to global_crop_size.

local_crop_size

Crop size of the local views.

local_crop_scale

Tuple of min and max scales relative to local_crop_size.

n_local_views

Number of generated local views.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

kernel_scale

Old argument. Value is deprecated in favor of sigmas. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

solarization

Probability to apply solarization on the second global view.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.fast_siam_transform.FastSiamTransform(num_views: int = 4, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for FastSiam.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 4.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

num_views

Number of views (num_views = K+1 where K is the number of target views).

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.gaussian_blur.GaussianBlur(kernel_size: Optional[float] = None, prob: float = 0.5, scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2))

Implementation of random Gaussian blur.

Utilizes the built-in ImageFilter method from PIL to apply a Gaussian blur to the input image with a certain probability. The blur is further randomized by sampling uniformly the values of the standard deviation of the Gaussian kernel.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

prob

Probability with which the blur is applied.

scale

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

__call__(sample: Image) Image

Blurs the image with a given probability.

Parameters

sample – PIL image to which blur will be applied.

Returns

Blurred image or original image.

class lightly.transforms.image_grid_transform.ImageGridTransform(transforms: Sequence[Compose])

Transforms an image into multiple views and grids.

Used for VICRegL.

transforms

A sequence of (image_grid_transform, view_transform) tuples. The image_grid_transform creates a new view and grid from the image. The view_transform further augments the view. Every transform tuple is applied once to the image, creating len(transforms) views and grids.

__call__(image: Union[Tensor, Image]) Union[List[Tensor], List[Image]]

Transforms an image into multiple views.

Every transform in self.transforms creates a new view.

Parameters

image – Image to be transformed into multiple views and grids.

Returns

List of views and grids tensors or PIL images. In the VICRegL implementation it has size: [

[3, global_crop_size, global_crop_size], [3, local_crop_size, local_crop_size], [global_grid_size, global_grid_size, 2], [local_grid_size, local_grid_size, 2]

]

class lightly.transforms.jigsaw.Jigsaw(n_grid: int = 3, img_size: int = 255, crop_size: int = 64, transform: Compose = ToTensor())

Implementation of Jigsaw image augmentation, inspired from PyContrast library.

Generates n_grid**2 random crops and returns a list.

This augmentation is instrumental to PIRL.

n_grid

Side length of the meshgrid, sqrt of the number of crops.

img_size

Size of image.

crop_size

Size of crops.

transform

Transformation to apply on each crop.

Examples

>>> from lightly.transforms import Jigsaw
>>>
>>> jigsaw_crop = Jigsaw(n_grid=3, img_size=255, crop_size=64, transform=transforms.ToTensor())
>>>
>>> # img is a PIL image
>>> crops = jigsaw_crops(img)
__call__(img: Image) Tensor

Performs the Jigsaw augmentation :Parameters: img – PIL image to perform Jigsaw augmentation on.

Returns

Torch tensor with stacked crops.

class lightly.transforms.mae_transform.MAETransform(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the view augmentation for MAE [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 1.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

input_size

Size of the input image in pixels.

min_scale

Minimum size of the randomized crop relative to the input_size.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

__call__(image: Union[Tensor, Image]) List[Tensor]

Applies the transforms to the input image.

Parameters

image – The input image to apply the transforms to.

Returns

The transformed image.

class lightly.transforms.mmcr_transform.MMCRTransform(k: int = 8, input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.2, cj_hue: float = 0.1, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 1.0, solarization_prob: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MMCR[0], which are based on BYOL[1].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length k.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • Solarization

  • ImageNet normalization

Please refer to the BYOL implementation for additional details.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of tensors of length k.

k

Number of views.

transform

The transform to apply to each view.

class lightly.transforms.moco_transform.MoCoV1Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v1.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • ImageNet normalization

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.moco_transform.MoCoV2Transform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v2 [0].

Similar to SimCLRTransform, but with different values for color jittering and minimum scale of the random resized crop.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.msn_transform.MSNTransform(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: Dict[str, List[float]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MSN [0].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 * random_views + focal_views. (12 by default)

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].

random_size

Size of the random image views in pixels.

focal_size

Size of the focal image views in pixels.

random_views

Number of random views to generate.

focal_views

Number of focal views to generate.

random_crop_scale

Minimum and maximum size of the randomized crops for the relative to random_size.

focal_crop_scale

Minimum and maximum size of the randomized crops relative to focal_size.

cj_prob

Probability that color jittering is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

random_gray_scale

Probability of conversion to grayscale.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.multi_crop_transform.MultiCropTranform(crop_sizes: Tuple[int, ...], crop_counts: Tuple[int, ...], crop_min_scales: Tuple[float, ...], crop_max_scales: Tuple[float, ...], transforms: Compose)

Implements the multi-crop transformations. Used by Swav.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length crop_counts.

Applies the following augmentations by default:
  • Random resized crop

  • transforms passed by constructor

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

transforms

Transforms which are applied to all crops.

class lightly.transforms.multi_view_transform.MultiViewTransform(transforms: Sequence[Compose])

Transforms an image into multiple views.

Parameters

transforms – A sequence of transforms. Every transform creates a new view.

__call__(image: Union[Tensor, Image]) Union[List[Tensor], List[Image]]

Transforms an image into multiple views.

Every transform in self.transforms creates a new view.

Parameters

image – Image to be transformed into multiple views.

Returns

List of views.

class lightly.transforms.pirl_transform.PIRLTransform(input_size: Union[int, Tuple[int, int]] = 64, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2 (original, augmented).

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Jigsaw puzzle

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

hf_prob

Probability that horizontal flip is applied.

n_grid

Sqrt of the number of grids in the jigsaw image.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.random_crop_and_flip_with_grid.Location(top: float, left: float, height: float, width: float, image_height: float, image_width: float, horizontal_flip: bool = False, vertical_flip: bool = False)
class lightly.transforms.random_crop_and_flip_with_grid.RandomHorizontalFlipWithLocation(p=0.5)

See base class.

forward(img: Image, location: Location) Tuple[Image, Location]

Horizontal flip image.

Horizontally flip the given image randomly with a given probability and return both the resulting image and the location.

Parameters
  • img (PIL Image or Tensor) – Image to be flipped..

  • Location – Location object linked to the image

Returns

PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.horizontal_flip parameter

class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropAndFlip(grid_size: int = 7, crop_size: int = 224, crop_min_scale: float = 0.05, crop_max_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.5)

Randomly flip and crop an image.

A PyTorch module that applies random cropping, horizontal and vertical flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.

Parameters
  • grid_size – The number of grid cells in the output grid tensor.

  • crop_size – The size (in pixels) of the random crops.

  • crop_min_scale – The minimum scale factor for random resized crops.

  • crop_max_scale – The maximum scale factor for random resized crops.

  • hf_prob – The probability of applying horizontal flipping to the image.

  • normalize – A dictionary containing the mean and std values for normalizing the image.

forward(img: Image) Tuple[Image, Tensor]

Applies random cropping and horizontal flipping to an image, and returns the transformed image and a grid tensor used to map the image back to the original image space in an NxN grid.

Parameters

img – The input PIL image.

Returns

A tuple containing the transformed PIL image and the grid tensor.

location_to_NxN_grid(location: Location) Tensor

Create grid from location object.

Create a grid tensor with grid_size rows and grid_size columns, where each cell represents a region of the original image. The grid is used to map the cropped and transformed image back to the original image space.

Parameters

location – An instance of the Location class, containing the location and size of the transformed image in the original image space.

Returns

A grid tensor of shape (grid_size, grid_size, 2), where the last dimension represents the (x, y) coordinate of the center of each cell in the original image space.

class lightly.transforms.random_crop_and_flip_with_grid.RandomResizedCropWithLocation(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR)

Do a random resized crop and return both the resulting image and the location. See base class.

forward(img: Image) Tuple[Image, Location]
Parameters

img (PIL Image or Tensor) – Image to be cropped.

Returns

PIL Image or Tensor – Randomly cropped image Location: Location object containing crop parameters

class lightly.transforms.random_crop_and_flip_with_grid.RandomVerticalFlipWithLocation(p=0.5)

See base class.

forward(img: Image, location: Location) Tuple[Image, Location]

Vertical flip image.

Vertically flip the given image randomly with a given probability and return both the resulting image and the location.

Parameters
  • img (PIL Image or Tensor) – Image to be flipped..

  • Location – Location object linked to the image

Returns

PIL Image or Tensor – Randomly flipped image Location: Location object with updated location.vertical_flip parameter

class lightly.transforms.rotation.RandomRotate(prob: float = 0.5, angle: int = 90)

Implementation of random rotation.

Randomly rotates an input image by a fixed angle. By default, we rotate the image by 90 degrees with a probability of 50%.

This augmentation can be very useful for rotation invariant images such as in medical imaging or satellite imaginary.

prob

Probability with which image is rotated.

angle

Angle by which the image is rotated. We recommend multiples of 90 to prevent rasterization artifacts. If you pick numbers like 90, 180, 270 the tensor will be rotated without introducing any artifacts.

__call__(image: Union[Image, Tensor]) Union[Image, Tensor]

Rotates the image with a given probability.

Parameters

image – PIL image or tensor which will be rotated.

Returns

Rotated image or original image.

class lightly.transforms.rotation.RandomRotateDegrees(prob: float, degrees: Union[float, Tuple[float, float]])

Random rotate image between two rotation angles with a random probability.

prob

Probability with which image is rotated.

degrees

Range of degrees to select from. If degrees is a number instead of a sequence like (min, max), the range of degrees will be (-degrees, +degrees). The image is rotated counter-clockwise with a random angle in the (min, max) range or in the (-degrees, +degrees) range.

__call__(image: Union[Image, Tensor]) Union[Image, Tensor]

Rotates the images with a given probability.

Parameters

image – PIL image or tensor which will be rotated.

Returns

Rotated image or original image.

class lightly.transforms.simclr_transform.SimCLRTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimCLR [0, 1].

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

Note that SimCLR v1 and v2 use the same data augmentations.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of [tensor, tensor].

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strenght to 0.5.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.simsiam_transform.SimSiamTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.1, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimSiam.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value. For datasets with small images, such as CIFAR, it is recommended to set cj_strength to 0.5.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.smog_transform.SMoGTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (4, 4), crop_min_scales: Tuple[float, float] = (0.2, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.2), gaussian_blur_probs: Tuple[float, float] = (0.5, 0.1), gaussian_blur_kernel_sizes: Tuple[Optional[float], Optional[float]] = (None, None), gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), solarize_probs: Tuple[float, float] = (0.0, 0.2), hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SMoG.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length sum(crop_counts). (8 by default)

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • Random solarization

  • ImageNet normalization

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

gaussian_blur_probs

Probability of Gaussian blur for each crop category.

gaussian_blur_kernel_sizes

Deprecated values in favour of sigmas.

gaussian_blur_sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled.

solarize_probs

Probability of solarization for each crop category.

hf_prob

Probability that horizontal flip is applied.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

random_gray_scale

Probability of conversion to grayscale.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.solarize.RandomSolarization(prob: float = 0.5, threshold: int = 128)

Implementation of random image Solarization.

Utilizes the integrated image operation solarize from Pillow. Solarization inverts all pixel values above a threshold (default: 128).

probability

Probability to apply the transformation

threshold

Threshold for solarization.

__call__(sample: Image) Image

Solarizes the given input image

Parameters

sample – PIL image to which solarize will be applied.

Returns

Solarized image or original image.

class lightly.transforms.swav_transform.SwaVTransform(crop_sizes: Tuple[int, int] = (224, 96), crop_counts: Tuple[int, int] = (2, 6), crop_min_scales: Tuple[float, float] = (0.14, 0.05), crop_max_scales: Tuple[float, float] = (1.0, 0.14), hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 1.0, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.8, cj_hue: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the multi-crop transformations for SwaV.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length sum(crop_counts). (8 by default)

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • ImageNet normalization

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.vicreg_transform.VICRegTransform(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, min_scale: float = 0.08, random_gray_scale: float = 0.2, solarize_prob: float = 0.1, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for VICReg.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length 2.

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Random solarization

  • Gaussian blur

  • ImageNet normalization

Similar to SimCLR transform but with extra solarization.

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

solarize_prob

Probability of solarization.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.transforms.vicregl_transform.VICRegLTransform(global_crop_size: int = 224, local_crop_size: int = 96, n_global_views: int = 2, n_local_views: int = 6, global_crop_scale: Tuple[float, float] = (0.2, 1.0), local_crop_scale: Tuple[float, float] = (0.05, 0.2), global_grid_size: int = 7, local_grid_size: int = 3, global_gaussian_blur_prob: float = 0.5, local_gaussian_blur_prob: float = 0.1, global_gaussian_blur_kernel_size: Optional[float] = None, local_gaussian_blur_kernel_size: Optional[float] = None, global_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), local_gaussian_blur_sigmas: Tuple[float, float] = (0.1, 2), global_solarize_prob: float = 0.0, local_solarize_prob: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, cj_prob: float = 1.0, cj_strength: float = 0.5, cj_bright: float = 0.8, cj_contrast: float = 0.8, cj_sat: float = 0.4, cj_hue: float = 0.2, random_gray_scale: float = 0.2, normalize: Union[None, Dict[str, List[float]]] = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Transforms images for VICRegL.

Input to this transform:

PIL Image or Tensor.

Output of this transform:

List of Tensor of length n_global_views + n_local_views. (8 by default)

Applies the following augmentations by default:
  • Random resized crop

  • Random horizontal flip

  • Color jitter

  • Random gray scale

  • Gaussian blur

  • Random solarization

  • ImageNet normalization

global_crop_size

Size of the input image in pixels for the global crop views.

local_crop_size

Size of the input image in pixels for the local crop views.

n_global_views

Number of global crop views to generate.

n_local_views

Number of local crop views to generate. For ResNet backbones it is recommended to set this to 0, see [0].

global_crop_scale

Min and max scales for the global crop views.

local_crop_scale

Min and max scales for the local crop views.

global_grid_size

Grid size for the global crop views.

local_grid_size

Grid size for the local crop views.

global_gaussian_blur_prob

Probability of Gaussian blur for the global crop views.

local_gaussian_blur_prob

Probability of Gaussian blur for the local crop views.

global_gaussian_blur_kernel_size

Will be deprecated in favor of global_gaussian_blur_sigmas argument. If set, the old behavior applies and global_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with global_gaussian_blur_kernel_size * input_size. Applied to global crop views.

local_gaussian_blur_kernel_size

Will be deprecated in favor of local_gaussian_blur_sigmas argument. If set, the old behavior applies and local_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with local_gaussian_blur_kernel_size * input_size. Applied to local crop views.

global_gaussian_blur_sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if global_gaussian_blur_kernel_size is set. Applied to global crop views.

local_gaussian_blur_sigmas

Tuple of min and max value from which the std of the gaussian kernel is sampled. It is ignored if local_gaussian_blur_kernel_size is set. Applied to local crop views.

global_solarize_prob

Probability of solarization for the global crop views.

local_solarize_prob

Probability of solarization for the local crop views.

hf_prob

Probability that horizontal flip is applied.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter. cj_bright, cj_contrast, cj_sat, and cj_hue are multiplied by this value.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

random_gray_scale

Probability of conversion to grayscale.

normalize

Dictionary with mean and standard deviation for normalization.