lightly.utils

The lightly.utils package provides global utility methods.

The io module contains utility to save and load embeddings in a format which is understood by the Lightly library. With the embeddings_2d module, embeddings can be transformed to a two-dimensional space for better visualization.

.io

I/O operations to save and load embeddings.

lightly.utils.io.load_embeddings(path: str)

Loads embeddings from a csv file in a Lightly compatible format.

Args:
path:

Path to the csv file.

Returns:

The embeddings as a numpy array, labels as a list of integers, and filenames as a list of strings in the order they were saved.

The embeddings will always be of the Float32 datatype.

Examples:
>>> import lightly.utils.io as io
>>> embeddings, labels, filenames = io.load_embeddings(
>>>     'path/to/my/embeddings.csv')
lightly.utils.io.load_embeddings_as_dict(path: str, embedding_name: str = 'default', return_all: bool = False)

Loads embeddings from csv and store it in a dictionary for transfer.

Loads embeddings to a dictionary which can be serialized and sent to the Lightly servers. It is recommended that the embedding_name is always specified because the Lightly web-app does not allow two embeddings with the same name.

Args:
path:

Path to the csv file.

embedding_name:

Name of the embedding for the platform.

return_all:

If true, return embeddings, labels, and filenames, too.

Returns:

A dictionary containing the embedding information (see load_embeddings)

Examples:
>>> import lightly.utils.io as io
>>> embedding_dict = io.load_embeddings_as_dict(
>>>     'path/to/my/embeddings.csv',
>>>     embedding_name='MyEmbeddings')
>>>
>>> result = io.load_embeddings_as_dict(
>>>     'path/to/my/embeddings.csv',
>>>     embedding_name='MyEmbeddings',
>>>     return_all=True)
>>> embedding_dict, embeddings, labels, filenames = result
lightly.utils.io.save_embeddings(path: str, embeddings: numpy.ndarray, labels: List[int], filenames: List[str])

Saves embeddings in a csv file in a Lightly compatible format.

Creates a csv file at the location specified by path and saves embeddings, labels, and filenames.

Args:
path:

Path to the csv file.

embeddings:

Embeddings of the images as a numpy array (n x d).

labels:

List of integer labels.

filenames:

List of filenames.

Raises:

ValueError if embeddings, labels, and filenames have different lengths.

Examples:
>>> import lightly.utils.io as io
>>> io.save_embeddings(
>>>     'path/to/my/embeddings.csv',
>>>     embeddings,
>>>     labels,
>>>     filenames)

.embeddings_2d

Transform embeddings to two-dimensional space for visualization.

class lightly.utils.embeddings_2d.PCA(n_components: int = 2, eps: float = 1e-10)

Handmade PCA to bypass sklearn dependency.

Attributes:
n_components:

Number of principal components to keep.

eps:

Epsilon for numerical stability.

fit(X: numpy.ndarray)

Fits PCA to data in X.

Args:
X:

Datapoints stored in numpy array of size n x d.

Returns:

PCA object to transform datapoints.

transform(X: numpy.ndarray)

Uses PCA to transform data in X.

Args:
X:

Datapoints stored in numpy array of size n x d.

Returns:

Numpy array of n x p datapoints where p <= d.

lightly.utils.embeddings_2d.fit_pca(embeddings: numpy.ndarray, n_components: int = 2, fraction: float = None)

Fits PCA to randomly selected subset of embeddings.

For large datasets, it can be unfeasible to perform PCA on the whole data. This method can fit a PCA on a fraction of the embeddings in order to save computational resources.

Args:
embeddings:

Datapoints stored in numpy array of size n x d.

n_components:

Number of principal components to keep.

fraction:

Fraction of the dataset to fit PCA on.

Returns:

A transformer which can be used to transform embeddings to lower dimensions.

Raises:

ValueError if fraction < 0 or fraction > 1.