lightly.data

The lightly.data module provides a dataset wrapper and collate functions.

.collate

Collate Functions

class lightly.data.collate.BaseCollateFunction(transform: torchvision.transforms.transforms.Compose)

Base class for other collate implementations.

Takes a batch of images as input and transforms each image into two different augmentations with the help of random transforms. The images are then concatenated such that the output batch is exactly twice the length of the input batch.

Attributes:
transform:

A set of torchvision transforms which are randomly applied to each image.

forward(batch: List[tuple])

Turns a batch of tuples into a tuple of batches.

Args:
batch:

A batch of tuples of images, labels, and filenames which is automatically provided if the dataloader is built from a LightlyDataset.

Returns:

A tuple of images, labels, and filenames. The images consist of two batches corresponding to the two transformations of the input images.

Examples:
>>> # define a random transformation and the collate function
>>> transform = ... # some random augmentations
>>> collate_fn = BaseCollateFunction(transform)
>>>
>>> # input is a batch of tuples (here, batch_size = 1)
>>> input = [(img, 0, 'my-image.png')]
>>> output = collate_fn(input)
>>>
>>> # output consists of two random transforms of the images,
>>> # the labels, and the filenames in the batch
>>> (img_t0, img_t1), label, filename = output
class lightly.data.collate.ImageCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.7, cj_contrast: float = 0.7, cj_sat: float = 0.7, cj_hue: float = 0.2, min_scale: float = 0.15, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implementation of a collate function for images.

This is an implementation of the BaseCollateFunction with a concrete set of transforms.

The set of transforms is inspired by the SimCLR paper as it has shown to produce powerful embeddings.

Attributes:
input_size:

Size of the input image in pixels.

cj_prob:

Probability that color jitter is applied.

cj_bright:

How much to jitter brightness.

cj_contrast:

How much to jitter constrast.

cj_sat:

How much to jitter saturation.

cj_hue:

How much to jitter hue.

min_scale:

Minimum size of the randomized crop relative to the input_size.

random_gray_scale:

Probability of conversion to grayscale.

gaussian_blur:

Probability of Gaussian blur.

kernel_size:

Sigma of gaussian blur is kernel_size * input_size.

vf_prob:

Probability that vertical flip is applied.

hf_prob:

Probability that horizontal flip is applied.

rr_prob:

Probability that random (+90 degree) rotation is applied.

normalize:

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

class lightly.data.collate.MoCoCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for MoCo v1.

For MoCo v2, simply use the SimCLR settings.

Attributes:
input_size:

Size of the input image in pixels.

cj_prob:

Probability that color jitter is applied.

cj_strength:

Strength of the color jitter.

min_scale:

Minimum size of the randomized crop relative to the input_size.

random_gray_scale:

Probability of conversion to grayscale.

gaussian_blur:

Probability of Gaussian blur.

kernel_size:

Sigma of gaussian blur is kernel_size * input_size.

vf_prob:

Probability that vertical flip is applied.

hf_prob:

Probability that horizontal flip is applied.

rr_prob:

Probability that random (+90 degree) rotation is applied.

normalize:

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples:

>>> # MoCo v1 for ImageNet
>>> collate_fn = MoCoCollateFunction()
>>> 
>>> # MoCo v1 for CIFAR-10
>>> collate_fn = MoCoCollateFunction(
>>>     input_size=32,
>>> )
class lightly.data.collate.SimCLRCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})

Implements the transformations for SimCLR.

Attributes:
input_size:

Size of the input image in pixels.

cj_prob:

Probability that color jitter is applied.

cj_strength:

Strength of the color jitter.

min_scale:

Minimum size of the randomized crop relative to the input_size.

random_gray_scale:

Probability of conversion to grayscale.

gaussian_blur:

Probability of Gaussian blur.

kernel_size:

Sigma of gaussian blur is kernel_size * input_size.

vf_prob:

Probability that vertical flip is applied.

hf_prob:

Probability that horizontal flip is applied.

rr_prob:

Probability that random (+90 degree) rotation is applied.

normalize:

Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.

Examples:

>>> # SimCLR for ImageNet
>>> collate_fn = SimCLRCollateFunction()
>>> 
>>> # SimCLR for CIFAR-10
>>> collate_fn = SimCLRCollateFunction(
>>>     input_size=32,
>>>     gaussian_blur=0.,
>>> )

.dataset

Lightly Dataset

class lightly.data.dataset.LightlyDataset(input_dir: str, transform: torchvision.transforms.transforms.Compose = None, index_to_filename: Callable[[torchvision.datasets.vision.VisionDataset, int], str] = None)

Provides a uniform data interface for the embedding models.

Should be used for all models and functions in the lightly package. Returns a tuple (sample, target, fname) when accessed using __getitem__.

The LightlyDataset supports different input sources. You can use it on a folder of images. You can also use it on a folder with subfolders with images (ImageNet style). If the input_dir has subfolders each subfolder gets its own target label. You can also work with videos (requires pyav). If there are multiple videos in the input_dir each video gets a different target label assigned. If input_dir contains images and videos only the videos are used.

Can also be used in combination with the from_torch_dataset method to load a dataset offered by torchvision (e.g. cifar10).

Args:
input_dir:

Path to directory holding the images or videos to load.

transform:

Image transforms (as in torchvision).

index_to_filename:

Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.

Examples:
>>> # load a dataset consisting of images from a local folder
>>> # mydata/
>>> # `- img1.png
>>> # `- img2.png
>>> # `- ...
>>> import lightly.data as data
>>> dataset = data.LightlyDataset(input_dir='path/to/mydata/')
>>> sample, target, fname = dataset[0]
>>>
>>> # also works with subfolders
>>> # mydata/
>>> # `- subfolder1
>>> #     `- img1.png
>>> # `- subfolder2
>>> # ...
>>>
>>> # also works with videos
>>> # mydata/
>>> # `- video1.mp4
>>> # `- video2.mp4
>>> # `- ...
dump(output_dir: str, filenames: Optional[List[str]] = None, format: Optional[str] = None)

Saves images in the dataset to the output directory.

Will copy the images from the input directory to the output directory if possible. If not (e.g. for VideoDatasets), will load the images and then save them to the output directory with the specified format.

Args:
output_dir:

Output directory where the image is stored.

filenames:

Filenames of the images to store. If None, stores all images.

format:

Image format. Can be any pillow image format (png, jpg, …). By default we try to use the same format as the input data. If not possible (e.g. for videos) we dump the image as a png image to prevent compression artifacts.

classmethod from_torch_dataset(dataset, transform=None, index_to_filename=None)

Builds a LightlyDataset from a PyTorch (or torchvision) dataset.

Args:
dataset:

PyTorch/torchvision dataset.

transform:

Image transforms (as in torchvision).

index_to_filename:

Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.

Returns:

A LightlyDataset object.

Examples:
>>> # load cifar10 from torchvision
>>> import torchvision
>>> import lightly.data as data
>>> base = torchvision.datasets.CIFAR10(root='./')
>>> dataset = data.LightlyDataset.from_torch_dataset(base)
get_filenames() → List[str]

Returns all filenames in the dataset.

get_filepath_from_filename(filename: str, image: PIL.Image.Image = None)

Returns the filepath given the filename of the image

There are three cases: - The dataset is a regular dataset with the images in the input dir. - The dataset is a video dataset, thus the images have to be saved in a temporary folder. - The dataset is a torch dataset, thus the images have to be saved in a temporary folder. Args:

filename:

The filename of the image

image:

The image corresponding to the filename

Returns:

The filename to the image, either the exiting one (case 1) or a newly created jpg (case 2, 3)

property transform

Getter for the transform of the dataset.