lightly.utils
.io
I/O operations to save and load embeddings.
- lightly.utils.io.check_embeddings(path: str, remove_additional_columns: bool = False) None
- Raises an error if the embeddings csv file has not the correct format - Use this check whenever you want to upload an embedding to the Lightly Platform. This method only checks whether the header row matches the specs: https://docs.lightly.ai/self-supervised-learning/getting_started/command_line_tool.html#id1 - Parameters
- path – Path to the embedding csv file 
- remove_additional_columns – If True, all additional columns which are not in {filenames, embeddings_x, labels} are removed. If false, they are kept unchanged. 
 
- Raises
- RuntimeError – 
 
- lightly.utils.io.load_embeddings(path: str) Tuple[NDArray[np.float64], List[int], List[str]]
- Loads embeddings from a csv file in a Lightly compatible format. - Parameters
- path – Path to the csv file. 
- Returns
- The embeddings as a numpy array, labels as a list of integers, and filenames as a list of strings in the order they were saved. - The embeddings will always be of the Float32 datatype. 
 - Examples - >>> import lightly.utils.io as io >>> embeddings, labels, filenames = io.load_embeddings( >>> 'path/to/my/embeddings.csv') 
- lightly.utils.io.load_embeddings_as_dict(path: str, embedding_name: str = 'default', return_all: bool = False) Union[Any, Tuple[Any, NDArray[np.float64], List[int], List[str]]]
- Loads embeddings from csv and store it in a dictionary for transfer. - Loads embeddings to a dictionary which can be serialized and sent to the Lightly servers. It is recommended that the embedding_name is always specified because the Lightly web-app does not allow two embeddings with the same name. - Parameters
- path – Path to the csv file. 
- embedding_name – Name of the embedding for the platform. 
- return_all – If true, return embeddings, labels, and filenames, too. 
 
- Returns
- A dictionary containing the embedding information (see load_embeddings) 
 - Examples - >>> import lightly.utils.io as io >>> embedding_dict = io.load_embeddings_as_dict( >>> 'path/to/my/embeddings.csv', >>> embedding_name='MyEmbeddings') >>> >>> result = io.load_embeddings_as_dict( >>> 'path/to/my/embeddings.csv', >>> embedding_name='MyEmbeddings', >>> return_all=True) >>> embedding_dict, embeddings, labels, filenames = result 
- lightly.utils.io.save_embeddings(path: str, embeddings: NDArray[np.float64], labels: List[int], filenames: List[str]) None
- Saves embeddings in a csv file in a Lightly compatible format. - Creates a csv file at the location specified by path and saves embeddings, labels, and filenames. - Parameters
- path – Path to the csv file. 
- embeddings – Embeddings of the images as a numpy array (n x d). 
- labels – List of integer labels. 
- filenames – List of filenames. 
 
- Raises
- ValueError – If embeddings, labels, and filenames have different lengths. 
 - Examples - >>> import lightly.utils.io as io >>> io.save_embeddings( >>> 'path/to/my/embeddings.csv', >>> embeddings, >>> labels, >>> filenames) 
- lightly.utils.io.save_schema(path: str, task_type: str, ids: List[int], names: List[str]) None
- Saves a prediction schema in the right format. - Parameters
- path – Where to store the schema. 
- task_type – Task type (e.g. classification, object-detection). 
- ids – List of category ids. 
- names – List of category names. 
 
 
- lightly.utils.io.save_tasks(path: str, tasks: List[str]) None
- Saves a list of prediction task names in the right format. - Parameters
- path – Where to store the task names. 
- tasks – List of task names. 
 
 
.embeddings_2d
Transforms embeddings to two-dimensional space for visualization.
- class lightly.utils.embeddings_2d.PCA(n_components: int = 2, eps: float = 1e-10)
- Handmade PCA to bypass sklearn dependency. - n_components
- Number of principal components to keep. 
 - eps
- Epsilon for numerical stability. 
 - mean
- Mean of the data. 
 - w
- Eigenvectors of the covariance matrix. 
 - fit(X: NDArray[np.float32]) PCA
- Fits PCA to data in X. - Parameters
- X – Datapoints stored in numpy array of size n x d. 
- Returns
- PCA – The fitted PCA object to transform data points. 
 
 - transform(X: NDArray[np.float32]) NDArray[np.float32]
- Uses PCA to transform data in X. - Parameters
- X – Datapoints stored in numpy array of size n x d. 
- Returns
- Numpy array of n x p datapoints where p <= d. 
- Raises
- ValueError – If PCA is not fitted before calling this method. 
 
 
- lightly.utils.embeddings_2d.fit_pca(embeddings: NDArray[np.float32], n_components: int = 2, fraction: Optional[float] = None) PCA
- Fits PCA to a randomly selected subset of embeddings. - For large datasets, it can be unfeasible to perform PCA on the whole data. This method can fit a PCA on a fraction of the embeddings in order to save computational resources. - Parameters
- embeddings – Datapoints stored in numpy array of size n x d. 
- n_components – Number of principal components to keep. 
- fraction – Fraction of the dataset to fit PCA on. 
 
- Returns
- A transformer which can be used to transform embeddings to lower dimensions. 
- Raises
- If fraction ≤ 0 or fraction > 1. – 
 
.benchmarking
.debug
- lightly.utils.debug.apply_transform_without_normalize(image: Image, transform) Image
- Applies the transform to the image but skips ToTensor and Normalize. - Parameters
- image – The input PIL image. 
- transform – The transformation to apply, excluding ToTensor and Normalize. 
 
- Returns
- The transformed image. 
 
- lightly.utils.debug.generate_grid_of_augmented_images(input_images: List[Image], collate_function: Union[BaseCollateFunction, MultiViewCollateFunction]) List[List[Image]]
- Returns a grid of augmented images. Images in a column belong together. - This function ignores the ToTensor and Normalize transforms for visualization purposes. - Parameters
- input_images – List of PIL images for which the augmentations should be plotted. 
- collate_function – The collate function of the self-supervised learning algorithm. Must be of type BaseCollateFunction or MultiViewCollateFunction. 
 
- Returns
- A grid of augmented images. Images in a column belong together. 
 
- lightly.utils.debug.plot_augmented_images(input_images: List[Image], collate_function: Union[BaseCollateFunction, MultiViewCollateFunction])
- Plots original images and augmented images in a figure. - This function ignores the ToTensor and Normalize transforms for visualization purposes. - Parameters
- input_images – List of PIL images for which the augmentations should be plotted. 
- collate_function – The collate function of the self-supervised learning algorithm. Must be of type BaseCollateFunction or MultiViewCollateFunction. 
 
- Returns
- A figure showing the original images in the left column and the augmented images to their right. If the collate_function is an instance of the BaseCollateFunction, two example augmentations are shown. For MultiViewCollateFunctions all the generated views are shown. 
 
- lightly.utils.debug.std_of_l2_normalized(z: Tensor) Tensor
- Calculates the mean of the standard deviation of z along each dimension. - This measure was used by [0] to determine the level of collapse of the learned representations. If the returned value is 0., the outputs z have collapsed to a constant vector. If the output z has a zero-mean isotropic Gaussian distribution [0], the returned value should be close to 1/sqrt(d), where d is the dimensionality of the output. - [0]: https://arxiv.org/abs/2011.10566 - Parameters
- z – A torch tensor of shape batch_size x dimension. 
- Returns
- The mean of the standard deviation of the l2 normalized tensor z along each dimension. 
 
.dist
- class lightly.utils.dist.GatherLayer(*args, **kwargs)
- Gather tensors from all processes, supporting backward propagation. - Adapted from the Solo-Learn project: https://github.com/vturrisi/solo-learn/blob/b69b4bd27472593919956d9ac58902a301537a4d/solo/utils/misc.py#L187 - static backward(ctx: FunctionCtx, *grads: Tensor) Tensor
- Defines a formula for differentiating the operation with backward mode automatic differentiation (alias to the vjp function). - This function is to be overridden by all subclasses. - It must accept a context - ctxas the first argument, followed by as many outputs as the- forward()returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to- forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.- The context can be used to retrieve tensors saved during the forward pass. It also has an attribute - ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,- backward()will have- ctx.needs_input_grad[0] = Trueif the first input to- forward()needs gradient computated w.r.t. the output.
 - static forward(ctx: FunctionCtx, input: Tensor) Tuple[Tensor, ...]
- Performs the operation. - This function is to be overridden by all subclasses. - It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types). - The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with - ctx.save_for_backward()if they are intended to be used in- backward(equivalently,- vjp) or- ctx.save_for_forward()if they are intended to be used for in- jvp.
 
- lightly.utils.dist.eye_rank(n: int, device: Optional[device] = None) Tensor
- Returns an (n, n * world_size) zero matrix with the diagonal for the rank of this process set to 1. - Example output where n=3, the current process has rank 1, and there are 4 processes in total: - rank0 rank1 rank2 rank3 0 0 0 | 1 0 0 | 0 0 0 | 0 0 0 0 0 0 | 0 1 0 | 0 0 0 | 0 0 0 0 0 0 | 0 0 1 | 0 0 0 | 0 0 0 - Equivalent to torch.eye for undistributed settings or if world size == 1. - Parameters
- n – Size of the square matrix on a single process. 
- device – Device on which the matrix should be created. 
 
- Returns
- A tensor with the appropriate diagonal filled for this rank. 
 
- lightly.utils.dist.gather(input: Tensor) Tuple[Tensor]
- Gathers a tensor from all processes and supports backpropagation. 
- lightly.utils.dist.rank() int
- Returns the rank of the current process. 
- lightly.utils.dist.rank_zero_only(fn: Callable[[...], R]) Callable[[...], Optional[R]]
- Decorator to ensure the function only runs on the process with rank 0. - Example - >>> @rank_zero_only >>> def print_rank_zero(message: str): >>> print(message) >>> >>> print_rank_zero("Hello from rank 0!") 
- lightly.utils.dist.world_size() int
- Returns the current world size (number of distributed processes). 
.reordering
- lightly.utils.reordering.sort_items_by_keys(keys: Sequence[_K], items: Sequence[_V], sorted_keys: Sequence[_K]) List[_V]
- Sorts the items in the same order as the sorted keys. - Parameters
- keys – Keys by which items can be identified. 
- items – Items to sort. 
- sorted_keys – Keys in sorted order. 
 
- Returns
- The list of sorted items. 
 - Examples - >>> keys = [3, 2, 1] >>> items = ['!', 'world', 'hello'] >>> sorted_keys = [1, 2, 3] >>> sorted_items = sort_items_by_keys( >>> keys, >>> items, >>> sorted_keys, >>> ) >>> print(sorted_items) >>> > ['hello', 'world', '!'] 
.version_compare
Utility method for comparing versions of libraries
- lightly.utils.version_compare.version_compare(v0: str, v1: str) int
- Returns 1 if version of v0 is larger than v1 and -1 otherwise - Use this method to compare Python package versions and see which one is newer. - Examples - >>> # compare two versions >>> version_compare('1.2.0', '1.1.2') >>> 1