lightly.data
The lightly.data module provides a dataset wrapper and collate functions.
.collate
Collate Functions
- class lightly.data.collate.BaseCollateFunction(transform: Compose)
Base class for other collate implementations.
Takes a batch of images as input and transforms each image into two different augmentations with the help of random transforms. The images are then concatenated such that the output batch is exactly twice the length of the input batch.
- transform
A set of torchvision transforms which are randomly applied to each image.
- forward(batch: List[tuple])
Turns a batch of tuples into a tuple of batches.
- Parameters
batch – A batch of tuples of images, labels, and filenames which is automatically provided if the dataloader is built from a LightlyDataset.
- Returns
A tuple of images, labels, and filenames. The images consist of two batches corresponding to the two transformations of the input images.
Examples
>>> # define a random transformation and the collate function >>> transform = ... # some random augmentations >>> collate_fn = BaseCollateFunction(transform) >>> >>> # input is a batch of tuples (here, batch_size = 1) >>> input = [(img, 0, 'my-image.png')] >>> output = collate_fn(input) >>> >>> # output consists of two random transforms of the images, >>> # the labels, and the filenames in the batch >>> (img_t0, img_t1), label, filename = output
- class lightly.data.collate.DINOCollateFunction(global_crop_size=224, global_crop_scale=(0.4, 1.0), local_crop_size=96, local_crop_scale=(0.05, 0.4), n_local_views=6, hf_prob=0.5, vf_prob=0, rr_prob=0, cj_prob=0.8, cj_bright=0.4, cj_contrast=0.4, cj_sat=0.2, cj_hue=0.1, random_gray_scale=0.2, gaussian_blur=(1.0, 0.1, 0.5), kernel_size=1.4, kernel_scale=0.6, solarization_prob=0.2, normalize={'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the global and local view augmentations for DINO [0].
This class generates two global and a user defined number of local views for each image in a batch. The code is adapted from [1].
[0]: DINO, 2021, https://arxiv.org/abs/2104.14294
- global_crop_size
Crop size of the global views.
- global_crop_scale
Tuple of min and max scales relative to global_crop_size.
- local_crop_size
Crop size of the local views.
- local_crop_scale
Tuple of min and max scales relative to local_crop_size.
- n_local_views
Number of generated local views.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- rr_prob
Probability that random (+90 degree) rotation is applied.
- cj_prob
Probability that color jitter is applied.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Tuple of probabilities to apply gaussian blur on the different views. The input is ordered as follows: (global_view_0, global_view_1, local_views)
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- kernel_scale
Fraction of the kernel size which is used for upper and lower limits of the randomized kernel size.
- solarization
Probability to apply solarization on the second global view.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.data.collate.ImageCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.7, cj_contrast: float = 0.7, cj_sat: float = 0.7, cj_hue: float = 0.2, min_scale: float = 0.15, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implementation of a collate function for images.
This is an implementation of the BaseCollateFunction with a concrete set of transforms.
The set of transforms is inspired by the SimCLR paper as it has shown to produce powerful embeddings.
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random (+90 degree) rotation is applied.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.data.collate.MAECollateFunction(input_size: Union[int, Tuple[int, int]] = 224, min_scale: float = 0.2, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the view augmentation for MAE [0].
[0]: Masked Autoencoder, 2021, https://arxiv.org/abs/2111.06377
- input_size
Size of the input image in pixels.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- forward(batch: List[tuple])
Turns a batch of tuples into a tuple of batches.
- Parameters
batch – The input batch.
- Returns
A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.
- class lightly.data.collate.MSNCollateFunction(random_size: int = 224, focal_size: int = 96, random_views: int = 2, focal_views: int = 10, random_crop_scale: Tuple[float, float] = (0.3, 1.0), focal_crop_scale: Tuple[float, float] = (0.05, 0.3), cj_prob: float = 0.8, cj_strength: float = 1.0, gaussian_blur: float = 0.5, kernel_size: float = 0.1, random_gray_scale: float = 0.2, hf_prob: float = 0.5, vf_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MSN [0].
Generates a set of random and focal views for each input image. The generated output is (views, target, filenames) where views is list with the following entries: [random_views_0, random_views_1, …, focal_views_0, focal_views_1, …].
[0]: Masked Siamese Networks, 2022: https://arxiv.org/abs/2204.07141
- random_size
Size of the random image views in pixels.
- focal_size
Size of the focal image views in pixels.
- random_views
Number of random views to generate.
- focal_views
Number of focal views to generate.
- random_crop_scale
Minimum and maximum size of the randomized crops for the relative to random_size.
- focal_crop_scale
Minimum and maximum size of the randomized crops relative to focal_size.
- cj_prob
Probability that color jittering is applied.
- cj_strength
Strength of the color jitter.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- random_gray_scale
Probability of conversion to grayscale.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
- class lightly.data.collate.MoCoCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.4, min_scale: float = 0.2, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for MoCo v1.
For MoCo v2, simply use the SimCLR settings.
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random (+90 degree) rotation is applied.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
Examples
>>> # MoCo v1 for ImageNet >>> collate_fn = MoCoCollateFunction() >>> >>> # MoCo v1 for CIFAR-10 >>> collate_fn = MoCoCollateFunction( >>> input_size=32, >>> )
- class lightly.data.collate.MultiCropCollateFunction(crop_sizes: List[int], crop_counts: List[int], crop_min_scales: List[float], crop_max_scales: List[float], transforms: Compose)
Implements the multi-crop transformations for SwaV.
- crop_sizes
Size of the input image in pixels for each crop category.
- crop_counts
Number of crops for each crop category.
- crop_min_scales
Min scales for each crop category.
- crop_max_scales
Max_scales for each crop category.
- transforms
Transforms which are applied to all crops.
- class lightly.data.collate.MultiViewCollateFunction(transforms: List[Compose])
Generates multiple views for each image in the batch.
- transforms
List of transformation functions. Each function is used to generate one view of the back.
- forward(batch: List[tuple])
Turns a batch of tuples into a tuple of batches.
- Parameters
batch – The input batch.
- Returns
A (views, labels, fnames) tuple where views is a list of tensors with each tensor containing one view of the batch.
- class lightly.data.collate.PIRLCollateFunction(input_size: int = 64, cj_prob: float = 0.8, cj_bright: float = 0.4, cj_contrast: float = 0.4, cj_sat: float = 0.4, cj_hue: float = 0.4, min_scale: float = 0.08, random_gray_scale: float = 0.2, hf_prob: float = 0.5, n_grid: int = 3, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for PIRL [0]. The jigsaw augmentation is applied during the forward pass.
[0] PIRL, 2019: https://arxiv.org/abs/1912.01991
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_bright
How much to jitter brightness.
- cj_contrast
How much to jitter constrast.
- cj_sat
How much to jitter saturation.
- cj_hue
How much to jitter hue.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- hf_prob
Probability that horizontal flip is applied.
- n_grid
Sqrt of the number of grids in the jigsaw image.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
Examples
>>> # PIRL for ImageNet >>> collate_fn = PIRLCollateFunction() >>> >>> # PIRL for CIFAR-10 >>> collate_fn = PIRLCollateFunction( >>> input_size=32, >>> )
- forward(batch: List[tuple])
Overriding the BaseCollateFunction class’s forward method because for PIRL we need only one augmented batch, as opposed to both, which the BaseCollateFunction creates.
- class lightly.data.collate.SimCLRCollateFunction(input_size: int = 224, cj_prob: float = 0.8, cj_strength: float = 0.5, min_scale: float = 0.08, random_gray_scale: float = 0.2, gaussian_blur: float = 0.5, kernel_size: float = 0.1, vf_prob: float = 0.0, hf_prob: float = 0.5, rr_prob: float = 0.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the transformations for SimCLR.
- input_size
Size of the input image in pixels.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter.
- min_scale
Minimum size of the randomized crop relative to the input_size.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- vf_prob
Probability that vertical flip is applied.
- hf_prob
Probability that horizontal flip is applied.
- rr_prob
Probability that random (+90 degree) rotation is applied.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
Examples
>>> # SimCLR for ImageNet >>> collate_fn = SimCLRCollateFunction() >>> >>> # SimCLR for CIFAR-10 >>> collate_fn = SimCLRCollateFunction( >>> input_size=32, >>> gaussian_blur=0., >>> )
- class lightly.data.collate.SwaVCollateFunction(crop_sizes: List[int] = [224, 96], crop_counts: List[int] = [2, 6], crop_min_scales: List[float] = [0.14, 0.05], crop_max_scales: List[float] = [1.0, 0.14], hf_prob: float = 0.5, vf_prob: float = 0.0, rr_prob: float = 0.0, cj_prob: float = 0.8, cj_strength: float = 0.8, random_gray_scale: float = 0.2, gaussian_blur: float = 0.0, kernel_size: float = 1.0, normalize: dict = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]})
Implements the multi-crop transformations for SwaV.
- crop_sizes
Size of the input image in pixels for each crop category.
- crop_counts
Number of crops for each crop category.
- crop_min_scales
Min scales for each crop category.
- crop_max_scales
Max_scales for each crop category.
- hf_prob
Probability that horizontal flip is applied.
- vf_prob
Probability that vertical flip is applied.
- rr_prob
Probability that random (+90 degree) rotation is applied.
- cj_prob
Probability that color jitter is applied.
- cj_strength
Strength of the color jitter.
- random_gray_scale
Probability of conversion to grayscale.
- gaussian_blur
Probability of Gaussian blur.
- kernel_size
Sigma of gaussian blur is kernel_size * input_size.
- normalize
Dictionary with ‘mean’ and ‘std’ for torchvision.transforms.Normalize.
Examples
>>> # SwaV for Imagenet >>> collate_fn = SwaVCollateFunction() >>> >>> # SwaV w/ 2x160 and 4x96 crops >>> collate_fn = SwaVCollateFunction( >>> crop_sizes=[160, 96], >>> crop_counts=[2, 4], >>> )
.dataset
Lightly Dataset
- class lightly.data.dataset.LightlyDataset(input_dir: Optional[str], transform: Optional[Compose] = None, index_to_filename: Optional[Callable[[VisionDataset, int], str]] = None, filenames: Optional[List[str]] = None, tqdm_args: Optional[Dict[str, Any]] = None, num_workers_video_frame_counting: int = 0)
Provides a uniform data interface for the embedding models.
Should be used for all models and functions in the lightly package. Returns a tuple (sample, target, fname) when accessed using __getitem__.
The LightlyDataset supports different input sources. You can use it on a folder of images. You can also use it on a folder with subfolders with images (ImageNet style). If the input_dir has subfolders, each subfolder gets its own target label. You can also work with videos (requires pyav). If there are multiple videos in the input_dir each video gets a different target label assigned. If input_dir contains images and videos only the videos are used.
Can also be used in combination with the from_torch_dataset method to load a dataset offered by torchvision (e.g. cifar10).
- Parameters
input_dir – Path to directory holding the images or videos to load.
transform – Image transforms (as in torchvision).
index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.
filenames – If not None, it filters the dataset in the input directory by the given filenames.
Examples
>>> # load a dataset consisting of images from a local folder >>> # mydata/ >>> # `- img1.png >>> # `- img2.png >>> # `- ... >>> import lightly.data as data >>> dataset = data.LightlyDataset(input_dir='path/to/mydata/') >>> sample, target, fname = dataset[0] >>> >>> # also works with subfolders >>> # mydata/ >>> # `- subfolder1 >>> # `- img1.png >>> # `- subfolder2 >>> # ... >>> >>> # also works with videos >>> # mydata/ >>> # `- video1.mp4 >>> # `- video2.mp4 >>> # `- ...
- dump(output_dir: str, filenames: Optional[List[str]] = None, format: Optional[str] = None)
Saves images in the dataset to the output directory.
Will copy the images from the input directory to the output directory if possible. If not (e.g. for VideoDatasets), will load the images and then save them to the output directory with the specified format.
- Parameters
output_dir – Output directory where the image is stored.
filenames – Filenames of the images to store. If None, stores all images.
format – Image format. Can be any pillow image format (png, jpg, …). By default we try to use the same format as the input data. If not possible (e.g. for videos) we dump the image as a png image to prevent compression artifacts.
- classmethod from_torch_dataset(dataset, transform=None, index_to_filename=None)
Builds a LightlyDataset from a PyTorch (or torchvision) dataset.
- Parameters
dataset – PyTorch/torchvision dataset.
transform – Image transforms (as in torchvision).
index_to_filename – Function which takes the dataset and index as input and returns the filename of the file at the index. If None, uses default.
- Returns
A LightlyDataset object.
Examples
>>> # load cifar10 from torchvision >>> import torchvision >>> import lightly.data as data >>> base = torchvision.datasets.CIFAR10(root='./') >>> dataset = data.LightlyDataset.from_torch_dataset(base)
- get_filenames() List[str]
Returns all filenames in the dataset.
- get_filepath_from_filename(filename: str, image: <module 'PIL.Image' from '/opt/runner_04/hostedtoolcache/Python/3.10.6/x64/lib/python3.10/site-packages/PIL/Image.py'> = None)
Returns the filepath given the filename of the image
- There are three cases:
The dataset is a regular dataset with the images in the input dir.
The dataset is a video dataset, thus the images have to be saved in a temporary folder.
The dataset is a torch dataset, thus the images have to be saved in a temporary folder.
- Parameters
filename – The filename of the image
image – The image corresponding to the filename
- Returns
The filename to the image, either the existing one (case 1) or a newly created jpg (case 2, 3)
- property transform
Getter for the transform of the dataset.