lightly.data

The lightly.data module provides a dataset wrapper and collate functions.

.dataset

class lightly.data.dataset.LightlyDataset(input_dir: Optional[str], transform: Optional[Compose] = None, index_to_filename: Optional[Callable[[VisionDataset, int], str]] = None, filenames: Optional[List[str]] = None, tqdm_args: Optional[Dict[str, Any]] = None, num_workers_video_frame_counting: int = 0)

Provides a uniform data interface for the embedding models.

Should be used for all models and functions in the lightly package. Returns a tuple (sample, target, fname) when accessed using __getitem__.

The LightlyDataset supports different input sources. You can use it on a folder of images. You can also use it on a folder with subfolders with images (ImageNet style). If the input_dir has subfolders, each subfolder gets its own target label. You can also work with videos (requires pyav). If there are multiple videos in the input_dir each video gets a different target label assigned. If input_dir contains images and videos only the videos are used.

Can also be used in combination with the from_torch_dataset method to load a dataset offered by torchvision (e.g. cifar10).

Parameters

input_dir – Path to directory holding the images or videos to load.
transform – Image transforms (as in torchvision).
index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.
filenames – If not None, it filters the dataset in the input directory by the given filenames.

Examples

>>> # load a dataset consisting of images from a local folder
>>> # mydata/
>>> # `- img1.png
>>> # `- img2.png
>>> # `- ...
>>> import lightly.data as data
>>> dataset = data.LightlyDataset(input_dir='path/to/mydata/')
>>> sample, target, fname = dataset[0]
>>>
>>> # also works with subfolders
>>> # mydata/
>>> # `- subfolder1
>>> #     `- img1.png
>>> # `- subfolder2
>>> # ...
>>>
>>> # also works with videos
>>> # mydata/
>>> # `- video1.mp4
>>> # `- video2.mp4
>>> # `- ...

dump(output_dir: str, filenames: Optional[List[str]] = None, format: Optional[str] = None)

Saves images in the dataset to the output directory.

Will copy the images from the input directory to the output directory if possible. If not (e.g. for VideoDatasets), will load the images and then save them to the output directory with the specified format.

Parameters

output_dir – Output directory where the image is stored.
filenames – Filenames of the images to store. If None, stores all images.
format – Image format. Can be any pillow image format (png, jpg, …). By default we try to use the same format as the input data. If not possible (e.g. for videos) we dump the image as a png image to prevent compression artifacts.

classmethod from_torch_dataset(dataset, transform=None, index_to_filename=None)

Builds a LightlyDataset from a PyTorch (or torchvision) dataset.

Parameters

dataset – PyTorch/torchvision dataset.
transform – Image transforms (as in torchvision).
index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.

Returns

A LightlyDataset object.

Examples

>>> # load cifar10 from torchvision
>>> import torchvision
>>> import lightly.data as data
>>> base = torchvision.datasets.CIFAR10(root='./')
>>> dataset = data.LightlyDataset.from_torch_dataset(base)

get_filenames() → List[str]: Returns all filenames in the dataset.

get_filepath_from_filename(filename: str, image: <module 'PIL.Image' from '/datasets/actions-runner/core_gpu_runner_01/_work/lightly-core/lightly-core/.venv/lib/python3.10/site-packages/PIL/Image.py'> = None)

Returns the filepath given the filename of the image

There are three cases:

The dataset is a regular dataset with the images in the input dir.
The dataset is a video dataset, thus the images have to be saved in a temporary folder.
The dataset is a torch dataset, thus the images have to be saved in a temporary folder.

Parameters

filename – The filename of the image
image – The image corresponding to the filename

Returns

The filename to the image, either the existing one (case 1) or a newly created jpg (case 2, 3)

property transform: Getter for the transform of the dataset.

.multi_view_collate

class lightly.data.multi_view_collate.MultiViewCollate

Collate function that combines views from multiple images into a batch.

This collate function processes a batch of tuples, where each tuple contains multiple views of an image, a label, and a filename. It outputs these as separate grouped tensors for easy batch processing.

Example

>>> transform = SimCLRTransform()
>>> dataset = LightlyDataset(input_dir, transform=transform)
>>> dataloader = DataLoader(dataset, batch_size=4, collate_fn=MultiViewCollate())
>>> for views, targets, filenames in dataloader:
>>>     view0, view1 = views  # each view is a tensor of shape (batch_size, channels, height, width)

__call__(batch: List[Tuple[List[Tensor], int, str]]) → Tuple[List[Tensor], Tensor, List[str]]

Turns a batch of (views, label, filename) tuples into a single (views, labels, filenames) tuple.

Parameters

batch – The input batch as a list of (views, label, filename) tuples, one for each image in the batch. views is a list of N view tensors, each representing a transformed version of the original image. label and filename are the class label and filename for the corresponding image.

Example

>>> batch = [
>>>     ([img_0_view_0, ..., img_0_view_N], label_0, filename_0),   # image 0
>>>     ([img_1_view_0, ..., img_1_view_N], label_1, filename_1),   # image 1
>>>     ...
>>>     ([img_B_view_0, ..., img_B_view_N], label_B, filename_B),  # image B
>>> ]

Returns

A tuple containing –

views: A list of tensors, where each tensor corresponds to one view of every image in the batch. Tensors are concatenated along the batch dimension.

labels: A tensor of shape (batch_size,) with torch.long dtype, containing the labels for all images in the batch.

filenames: A list of strings containing filenames for all images in the batch.

Example

>>> output = (
>>>     [
>>>         Tensor([img_0_view_0, ..., img_B_view_0]),    # view 0
>>>         Tensor([img_0_view_1, ..., img_B_view_1]),    # view 1
>>>         ...
>>>         Tensor([img_0_view_N, ..., img_B_view_N]),    # view N
>>>     ],
>>>     torch.tensor([label_0, ..., label_B], dtype=torch.long),
>>>     [filename_0, ..., filename_B],
>>> )

Notes

If the input batch is empty, a warning is issued, and an empty tuple ([], [], []) is returned.

.collate:

Collate Functions

class lightly.data.collate.BaseCollateFunction(transform: Compose)

Base class for other collate implementations.

Takes a batch of images as input and transforms each image into two different augmentations with the help of random transforms. The images are then concatenated such that the output batch is exactly twice the length of the input batch.

transform: A set of torchvision transforms which are randomly applied to each image.

forward(batch: List[Tuple[Image, int, str]])

Turns a batch of tuples into a tuple of batches.

Parameters: batch – A batch of tuples of images, labels, and filenames which is automatically provided if the dataloader is built from a LightlyDataset.
Returns: A tuple of images, labels, and filenames. The images consist of two batches corresponding to the two transformations of the input images.

Examples

>>> # define a random transformation and the collate function
>>> transform = ... # some random augmentations
>>> collate_fn = BaseCollateFunction(transform)
>>>
>>> # input is a batch of tuples (here, batch_size = 1)
>>> input = [(img, 0, 'my-image.png')]
>>> output = collate_fn(input)
>>>
>>> # output consists of two random transforms of the images,
>>> # the labels, and the filenames in the batch
>>> (img_t0, img_t1), label, filename = output

class lightly.data.collate.DINOCollateFunction(global_crop_size=224, global_crop_scale=(0.4, 1.0), local_crop_size=96, local_crop_scale=(0.05, 0.4), n_local_views=6, hf_prob=0.5, vf_prob=0, rr_prob=0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob=0.8, cj_bright=0.4, cj_contrast=0.4, cj_sat=0.2, cj_hue=0.1, random_gray_scale=0.2, gaussian_blur=(1.0, 0.1, 0.5), kernel_size: Optional[float] = None, kernel_scale: Optional[float] = None, sigmas: Tuple[float, float] = (0.1, 2), solarization_prob=0.2, normalize={'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the global and local view augmentations for DINO [0].

This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].

[0]: DINO, 2021, https://arxiv.org/abs/2104.14294
[1]: https://github.com/facebookresearch/dino

global_crop_size: Crop size of the global views.

global_crop_scale: Tuple of min and max scales relative to global_crop_size.

local_crop_size: Crop size of the local views.

local_crop_scale: Tuple of min and max scales relative to local_crop_size.

n_local_views: Number of generated local views.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob: Probability that color jitter is applied.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

kernel_scale: Old argument. Value is deprecated in favor of sigmas. If set, the old behavior applies and sigmas is ignored. Used to scale the kernel_size of a factor of kernel_scale

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

solarization: Probability to apply solarization on the second global view.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.IJEPAMaskCollator(input_size=(224, 224), patch_size=16, enc_mask_scale=(0.2, 0.8), pred_mask_scale=(0.2, 0.8), aspect_ratio=(0.3, 3.0), nenc=1, npred=2, min_keep=4, allow_overlap=False)

Collator for IJEPA model [0].

Experimental: Support for I-JEPA is experimental, there might be breaking changes in the future.

Code inspired by [1].

[0]: Joint-Embedding Predictive Architecture, 2023, https://arxiv.org/abs/2301.08243
[1]: https://github.com/facebookresearch/ijepa

class lightly.data.collate.ImageCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.7, cj_contrast: float = 0.7, cj_sat: float = 0.7, cj_hue: float = 0.2, min_scale: float = 0.15, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implementation of a collate function for images.

This is an implementation of the BaseCollateFunction with a concrete set of transforms.

The set of transforms is inspired by the SimCLR paper as it has shown to produce powerful embeddings.

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.MAECollateFunction(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the view augmentation for MAE [0].

[0]: Masked Autoencoder, 2021, https://arxiv.org/abs/2111.06377

input_size: Size of the input image in pixels.

min_scale: Minimum size of the randomized crop relative to the input_size.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Parameters: batch – The input batch.
Returns: A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.

class lightly.data.collate.MSNCollateFunction(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MSN [0].

Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].

[0]: Masked Siamese Networks, 2022: https://arxiv.org/abs/2204.07141

random_size: Size of the random image views in pixels.

focal_size: Size of the focal image views in pixels.

random_views: Number of random views to generate.

focal_views: Number of focal views to generate.

random_crop_scale: Minimum and maximum size of the randomized crops for the relative to random_size.

focal_crop_scale: Minimum and maximum size of the randomized crops relative to focal_size.

cj_prob: Probability that color jittering is applied.

cj_strength: Strength of the color jitter.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

random_gray_scale: Probability of conversion to grayscale.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.MoCoCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v1.

For MoCo v2, simply use the SimCLR settings.

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # MoCo v1 for ImageNet
>>> collate_fn = MoCoCollateFunction()
>>>
>>> # MoCo v1 for CIFAR-10
>>> collate_fn = MoCoCollateFunction(
>>>     input_size=32,
>>> )

class lightly.data.collate.MultiCropCollateFunction(crop_sizes: List[int], crop_counts: List[int], crop_min_scales: List[float], crop_max_scales: List[float], transforms: Compose)

Implements the multi-crop transformations for SwaV.

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

transforms: Transforms which are applied to all crops.

class lightly.data.collate.MultiViewCollateFunction(transforms: List[Compose])

Generates multiple views for each image in the batch.

transforms: List of transformation functions. Each function is used to generate one view of the back.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Parameters: batch – The input batch.
Returns: A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.

class lightly.data.collate.PIRLCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.

[0] PIRL, 2019: https://arxiv.org/abs/1912.01991

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

hf_prob: Probability that horizontal flip is applied.

n_grid: Sqrt of the number of grids in the jigsaw image.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # PIRL for ImageNet
>>> collate_fn = PIRLCollateFunction()
>>>
>>> # PIRL for CIFAR-10
>>> collate_fn = PIRLCollateFunction(
>>>     input_size=32,
>>> )

forward(batch: List[tuple]): Overriding the BaseCollateFunction class’s forward method because for PIRL we need only one augmented batch, as opposed to both, which the BaseCollateFunction creates.

class lightly.data.collate.SMoGCollateFunction(crop_sizes: List[int] = [224, 96], crop_counts: List[int] = [4, 4], crop_min_scales: List[float] = [0.2, 0.05], crop_max_scales: List[float] = [1.0, 0.2], gaussian_blur_probs: List[float] = [0.5, 0.1], gaussian_blur_kernel_sizes: Optional[List[float]] = [None, None], gaussian_blur_sigmas: Tuple[float, float] = (0.2, 2), solarize_probs: List[float] = [0.0, 0.2], hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, random_gray_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SMoG.

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

gaussian_blur_probs: Probability of Gaussian blur for each crop category.

gaussian_blur_kernel_sizes: Deprecated values in favour of sigmas.

gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled.

solarize_probs: Probability of solarization for each crop category.

hf_prob: Probability that horizontal flip is applied.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter.

random_gray_scale: Probability of conversion to grayscale.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.SimCLRCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimCLR.

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # SimCLR for ImageNet
>>> collate_fn = SimCLRCollateFunction()
>>>
>>> # SimCLR for CIFAR-10
>>> collate_fn = SimCLRCollateFunction(
>>>     input_size=32,
>>>     gaussian_blur=0.,
>>> )

class lightly.data.collate.SwaVCollateFunction(crop_sizes: List[int] = [224, 96], crop_counts: List[int] = [2, 6], crop_min_scales: List[float] = [0.14, 0.05], crop_max_scales: List[float] = [1.0, 0.14], hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, cj_prob: float = 0.8, cj_strength: float = 0.8, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the multi-crop transformations for SwaV.

crop_sizes: Size of the input image in pixels for each crop category.

crop_counts: Number of crops for each crop category.

crop_min_scales: Min scales for each crop category.

crop_max_scales: Max_scales for each crop category.

hf_prob: Probability that horizontal flip is applied.

vf_prob: Probability that vertical flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter.

random_gray_scale: Probability of conversion to grayscale.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # SwaV for Imagenet
>>> collate_fn = SwaVCollateFunction()
>>>
>>> # SwaV w/ 2x160 and 4x96 crops
>>> collate_fn = SwaVCollateFunction(
>>>     crop_sizes=[160, 96],
>>>     crop_counts=[2, 4],
>>> )

class lightly.data.collate.VICRegCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.2, cj_hue: float = 0.1, min_scale: float = 0.08, random_gray_scale: float = 0.2, solarize_prob: float = 0.1, gaussian_blur: float = 0.5, kernel_size: Optional[float] = None, sigmas: Tuple[float, float] = (0.2, 2), vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Optional[Union[float, Tuple[float, float]]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implementation of a collate function for images.

This is an implementation of the BaseCollateFunction with a concrete set of transforms.

The set of transforms is inspired by the SimCLR paper as it has shown to produce powerful embeddings.

input_size: Size of the input image in pixels.

cj_prob: Probability that color jitter is applied.

cj_bright: How much to jitter brightness.

cj_contrast: How much to jitter constrast.

cj_sat: How much to jitter saturation.

cj_hue: How much to jitter hue.

min_scale: Minimum size of the randomized crop relative to the input_size.

random_gray_scale: Probability of conversion to grayscale.

solarize_prob: Probability of solarization.

gaussian_blur: Probability of Gaussian blur.

kernel_size: Will be deprecated in favor of sigmas argument. If set, the old behavior applies and sigmas is ignored. Used to calculate sigma of gaussian blur with kernel_size * input_size.

sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if kernel_size is set.

vf_prob: Probability that vertical flip is applied.

hf_prob: Probability that horizontal flip is applied.

rr_prob: Probability that random rotation is applied.

rr_degrees: Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize: Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.VICRegLCollateFunction(global_crop_size: int = 224, local_crop_size: int = 96, global_crop_scale: Tuple[int] = (0.2, 1.0), local_crop_scale: Tuple[int] = (0.05, 0.2), global_grid_size: int = 7, local_grid_size: int = 3, global_gaussian_blur_prob: float = 0.5, local_gaussian_blur_prob: float = 0.1, global_gaussian_blur_kernel_size: Optional[float] = None, local_gaussian_blur_kernel_size: Optional[float] = None, global_gaussian_blur_sigmas: Tuple[float, float] = (0.2, 2), local_gaussian_blur_sigmas: Tuple[float, float] = (0.2, 2), global_solarize_prob: float = 0.0, local_solarize_prob: float = 0.2, hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, random_gray_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Transforms images for VICRegL.

global_crop_size: Size of the input image in pixels for the global crop category.

local_crop_size: Size of the input image in pixels for the local crop category.

global_crop_scale: Min and max scales for the global crop category.

local_crop_scale: Min and max scales for the local crop category.

global_grid_size: Grid size for the global crop category.

local_grid_size: Grid size for the local crop category.

global_gaussian_blur_prob: Probability of Gaussian blur for the global crop category.

local_gaussian_blur_prob: Probability of Gaussian blur for the local crop category.

global_gaussian_blur_kernel_size: Will be deprecated in favor of global_gaussian_blur_sigmas argument. If set, the old behavior applies and global_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with global_gaussian_blur_kernel_size * input_size. Applied to global crop category.

local_gaussian_blur_kernel_size: Will be deprecated in favor of local_gaussian_blur_sigmas argument. If set, the old behavior applies and local_gaussian_blur_sigmas is ignored. Used to calculate sigma of gaussian blur with local_gaussian_blur_kernel_size * input_size. Applied to local crop category.

global_gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if global_gaussian_blur_kernel_size is set. Applied to global crop category.

local_gaussian_blur_sigmas: Tuple of min and max value from which the std of the gaussian kernel is sampled. Is ignored if local_gaussian_blur_kernel_size is set. Applied to local crop category.

global_solarize_prob: Probability of solarization for the global crop category.

local_solarize_prob: Probability of solarization for the local crop category.

hf_prob: Probability that horizontal flip is applied.

cj_prob: Probability that color jitter is applied.

cj_strength: Strength of the color jitter.

random_gray_scale: Probability of conversion to grayscale.

normalize: Dictionary with mean and standard deviation for normalization.

forward(batch: List[Tuple[Image, int, str]]) → Tuple[Tuple[Tensor, Tensor, Tensor, Tensor], Tensor, Tensor]

Applies transforms to images in the input batch.

Parameters: batch – A list of tuples containing an image (as a PIL Image), a label (int), and a filename (str).
Returns: A tuple of transformed images (as a 4-tuple of torch.Tensors containing view_global, view_local, grid_global, grid_local), labels (as torch.Tensor), and filenames (as torch.Tensor).