lightly.data

The lightly.data module provides a dataset wrapper and collate functions.

.collate

Collate Functions

class lightly.data.collate.BaseCollateFunction(transform: torchvision.transforms.transforms.Compose)

Base class for other collate implementations.

Takes a batch of images as input and transforms each image into two different augmentations with the help of random transforms. The images are then concatenated such that the output batch is exactly twice the length of the input batch.

transform

A set of torchvision transforms which are randomly applied to each image.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Parameters

batch – A batch of tuples of images, labels, and filenames which is automatically provided if the dataloader is built from a LightlyDataset.

Returns

A tuple of images, labels, and filenames. The images consist of two batches corresponding to the two transformations of the input images.

Examples

>>> # define a random transformation and the collate function
>>> transform = ... # some random augmentations
>>> collate_fn = BaseCollateFunction(transform)
>>>
>>> # input is a batch of tuples (here, batch_size = 1)
>>> input = [(img, 0, 'my-image.png')]
>>> output = collate_fn(input)
>>>
>>> # output consists of two random transforms of the images,
>>> # the labels, and the filenames in the batch
>>> (img_t0, img_t1), label, filename = output
class lightly.data.collate.DINOCollateFunction(global_crop_size=224, global_crop_scale=(0.4, 1.0), local_crop_size=96, local_crop_scale=(0.05, 0.4), n_local_views=6, hf_prob=0.5, vf_prob=0, rr_prob=0, rr_degrees: Union[None, float, Tuple[float, float]] = None, cj_prob=0.8, cj_bright=0.4, cj_contrast=0.4, cj_sat=0.2, cj_hue=0.1, random_gray_scale=0.2, gaussian_blur=(1.0, 0.1, 0.5), kernel_size=1.4, kernel_scale=0.6, solarization_prob=0.2, normalize={'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the global and local view augmentations for DINO [0].

This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].

global_crop_size

Crop size of the global views.

global_crop_scale

Tuple of min and max scales relative to global_crop_size.

local_crop_size

Crop size of the local views.

local_crop_scale

Tuple of min and max scales relative to local_crop_size.

n_local_views

Number of generated local views.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob

Probability that color jitter is applied.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

kernel_scale

Fraction of the kernel size which is used for upper and lower limits of the randomized kernel size.

solarization

Probability to apply solarization on the second global view.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.ImageCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.7, cj_contrast: float = 0.7, cj_sat: float = 0.7, cj_hue: float = 0.2, min_scale: float = 0.15, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Union[None, float, Tuple[float, float]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implementation of a collate function for images.

This is an implementation of the BaseCollateFunction with a concrete set of transforms.

The set of transforms is inspired by the SimCLR paper as it has shown to produce powerful embeddings.

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.MAECollateFunction(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the view augmentation for MAE [0].

input_size

Size of the input image in pixels.

min_scale

Minimum size of the randomized crop relative to the input_size.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Parameters

batch – The input batch.

Returns

A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.

class lightly.data.collate.MSNCollateFunction(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, gaussian_blur: float = 0.5, kernel_size: float = 0.1, random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MSN [0].

Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].

random_size

Size of the random image views in pixels.

focal_size

Size of the focal image views in pixels.

random_views

Number of random views to generate.

focal_views

Number of focal views to generate.

random_crop_scale

Minimum and maximum size of the randomized crops for the relative to random_size.

focal_crop_scale

Minimum and maximum size of the randomized crops relative to focal_size.

cj_prob

Probability that color jittering is applied.

cj_strength

Strength of the color jitter.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

random_gray_scale

Probability of conversion to grayscale.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.MoCoCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Union[None, float, Tuple[float, float]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v1.

For MoCo v2, simply use the SimCLR settings.

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # MoCo v1 for ImageNet
>>> collate_fn = MoCoCollateFunction()
>>>
>>> # MoCo v1 for CIFAR-10
>>> collate_fn = MoCoCollateFunction(
>>>     input_size=32,
>>> )
class lightly.data.collate.MultiCropCollateFunction(crop_sizes: List[int], crop_counts: List[int], crop_min_scales: List[float], crop_max_scales: List[float], transforms: torchvision.transforms.transforms.Compose)

Implements the multi-crop transformations for SwaV.

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

transforms

Transforms which are applied to all crops.

class lightly.data.collate.MultiViewCollateFunction(transforms: List[torchvision.transforms.transforms.Compose])

Generates multiple views for each image in the batch.

transforms

List of transformation functions. Each function is used to generate one view of the back.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Parameters

batch – The input batch.

Returns

A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.

class lightly.data.collate.PIRLCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_bright

How much to jitter brightness.

cj_contrast

How much to jitter constrast.

cj_sat

How much to jitter saturation.

cj_hue

How much to jitter hue.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

hf_prob

Probability that horizontal flip is applied.

n_grid

Sqrt of the number of grids in the jigsaw image.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # PIRL for ImageNet
>>> collate_fn = PIRLCollateFunction()
>>>
>>> # PIRL for CIFAR-10
>>> collate_fn = PIRLCollateFunction(
>>>     input_size=32,
>>> )
forward(batch: List[tuple])

Overriding the BaseCollateFunction class’s forward method because for PIRL we need only one augmented batch, as opposed to both, which the BaseCollateFunction creates.

class lightly.data.collate.SMoGCollateFunction(crop_sizes: List[int] = [224, 96], crop_counts: List[int] = [4, 4], crop_min_scales: List[float] = [0.2, 0.05], crop_max_scales: List[float] = [1.0, 0.2], gaussian_blur_probs: List[float] = [0.5, 0.1], gaussian_blur_kernel_sizes: List[float] = [0.1, 0.1], solarize_probs: List[float] = [0.0, 0.2], hf_prob: float = 0.5, cj_prob: float = 1.0, cj_strength: float = 0.5, random_gray_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SMoG.

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

gaussian_blur_probs

Probability of Gaussian blur for each crop category.

gaussian_blur_kernel_sizes

Kernel size of Gaussian blur for each crop category.

solarize_probs

Probability of solarization for each crop category.

hf_prob

Probability that horizontal flip is applied.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter.

random_gray_scale

Probability of conversion to grayscale.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.SimCLRCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, rr_degrees: Union[None, float, Tuple[float, float]] = None, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimCLR.

input_size

Size of the input image in pixels.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter.

min_scale

Minimum size of the randomized crop relative to the input_size.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

vf_prob

Probability that vertical flip is applied.

hf_prob

Probability that horizontal flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # SimCLR for ImageNet
>>> collate_fn = SimCLRCollateFunction()
>>>
>>> # SimCLR for CIFAR-10
>>> collate_fn = SimCLRCollateFunction(
>>>     input_size=32,
>>>     gaussian_blur=0.,
>>> )
class lightly.data.collate.SwaVCollateFunction(crop_sizes: List[int] = [224, 96], crop_counts: List[int] = [2, 6], crop_min_scales: List[float] = [0.14, 0.05], crop_max_scales: List[float] = [1.0, 0.14], hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, rr_degrees: Union[None, float, Tuple[float, float]] = None, cj_prob: float = 0.8, cj_strength: float = 0.8, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: float = 1.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the multi-crop transformations for SwaV.

crop_sizes

Size of the input image in pixels for each crop category.

crop_counts

Number of crops for each crop category.

crop_min_scales

Min scales for each crop category.

crop_max_scales

Max_scales for each crop category.

hf_prob

Probability that horizontal flip is applied.

vf_prob

Probability that vertical flip is applied.

rr_prob

Probability that random rotation is applied.

rr_degrees

Range of degrees to select from for random rotation. If rr_degrees is None, images are rotated by 90 degrees. If rr_degrees is a (min, max) tuple, images are rotated by a random angle in [min, max]. If rr_degrees is a single number, images are rotated by a random angle in [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

cj_prob

Probability that color jitter is applied.

cj_strength

Strength of the color jitter.

random_gray_scale

Probability of conversion to grayscale.

gaussian_blur

Probability of Gaussian blur.

kernel_size

Sigma of gaussian blur is kernel_size * input_size.

normalize

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples

>>> # SwaV for Imagenet
>>> collate_fn = SwaVCollateFunction()
>>>
>>> # SwaV w/ 2x160 and 4x96 crops
>>> collate_fn = SwaVCollateFunction(
>>>     crop_sizes=[160, 96],
>>>     crop_counts=[2, 4],
>>> )

.dataset

Lightly Dataset

class lightly.data.dataset.LightlyDataset(input_dir: Optional[str], transform: Optional[torchvision.transforms.transforms.Compose] = None, index_to_filename: Optional[Callable[[torchvision.datasets.vision.VisionDataset, int], str]] = None, filenames: Optional[List[str]] = None, tqdm_args: Optional[Dict[str, Any]] = None, num_workers_video_frame_counting: int = 0)

Provides a uniform data interface for the embedding models.

Should be used for all models and functions in the lightly package. Returns a tuple (sample, target, fname) when accessed using __getitem__.

The LightlyDataset supports different input sources. You can use it on a folder of images. You can also use it on a folder with subfolders with images (ImageNet style). If the input_dir has subfolders, each subfolder gets its own target label. You can also work with videos (requires pyav). If there are multiple videos in the input_dir each video gets a different target label assigned. If input_dir contains images and videos only the videos are used.

Can also be used in combination with the from_torch_dataset method to load a dataset offered by torchvision (e.g. cifar10).

Parameters
  • input_dir – Path to directory holding the images or videos to load.

  • transform – Image transforms (as in torchvision).

  • index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.

  • filenames – If not None, it filters the dataset in the input directory by the given filenames.

Examples

>>> # load a dataset consisting of images from a local folder
>>> # mydata/
>>> # `- img1.png
>>> # `- img2.png
>>> # `- ...
>>> import lightly.data as data
>>> dataset = data.LightlyDataset(input_dir='path/to/mydata/')
>>> sample, target, fname = dataset[0]
>>>
>>> # also works with subfolders
>>> # mydata/
>>> # `- subfolder1
>>> #     `- img1.png
>>> # `- subfolder2
>>> # ...
>>>
>>> # also works with videos
>>> # mydata/
>>> # `- video1.mp4
>>> # `- video2.mp4
>>> # `- ...
dump(output_dir: str, filenames: Optional[List[str]] = None, format: Optional[str] = None)

Saves images in the dataset to the output directory.

Will copy the images from the input directory to the output directory if possible. If not (e.g. for VideoDatasets), will load the images and then save them to the output directory with the specified format.

Parameters
  • output_dir – Output directory where the image is stored.

  • filenames – Filenames of the images to store. If None, stores all images.

  • format – Image format. Can be any pillow image format (png, jpg, …). By default we try to use the same format as the input data. If not possible (e.g. for videos) we dump the image as a png image to prevent compression artifacts.

classmethod from_torch_dataset(dataset, transform=None, index_to_filename=None)

Builds a LightlyDataset from a PyTorch (or torchvision) dataset.

Parameters
  • dataset – PyTorch/torchvision dataset.

  • transform – Image transforms (as in torchvision).

  • index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.

Returns

A LightlyDataset object.

Examples

>>> # load cifar10 from torchvision
>>> import torchvision
>>> import lightly.data as data
>>> base = torchvision.datasets.CIFAR10(root='./')
>>> dataset = data.LightlyDataset.from_torch_dataset(base)
get_filenames() List[str]

Returns all filenames in the dataset.

get_filepath_from_filename(filename: str, image: <module 'PIL.Image' from '/opt/runner_01/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/site-packages/PIL/Image.py'> = None)

Returns the filepath given the filename of the image

There are three cases:
  • The dataset is a regular dataset with the images in the input dir.

  • The dataset is a video dataset, thus the images have to be saved in a temporary folder.

  • The dataset is a torch dataset, thus the images have to be saved in a temporary folder.

Parameters
  • filename – The filename of the image

  • image – The image corresponding to the filename

Returns

The filename to the image, either the existing one (case 1) or a newly created jpg (case 2, 3)

property transform

Getter for the transform of the dataset.