Embed¶

The embed command is used to embed images with a training checkpoint. An example command looks like this:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.pretrain(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
    )

    lightly_train.embed(
        out="my_embeddings.pth",                            
        data="my_data_dir",                                 
        checkpoint="out/my_experiment/checkpoints/last.ckpt",
        format="torch",
    )

Command Line

lightly-train pretrain out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50"
lightly-train embed out=my_embeddings.pth data=my_data_dir checkpoint=out/my_experiment/checkpoints/last.ckpt format=torch

The above code example pretrains a model and uses the last training checkpoint to generate image embeddings from the images in "my_data_dir".

Tip

See lightly_train.embed() for a complete list of arguments.

Out¶

The out argument specifies the output file where the embeddings and corresponding image filenames are saved. Image filenames are always relative to the data directory. See format for details on how the embeddings are saved.

The embedding dimension is determined by the model used for training. If the embed_dim argument was set during training, the embeddings will have this dimension, otherwise the default embedding dimension of the model is used.

Data¶

The data argument specifies the directory with images to embed. The same image formats as in the training command are supported.

Checkpoint¶

The checkpoint argument specifies the LightlyTrain checkpoint to use for embedding. This is the checkpoint saved to out/my_experiment/checkpoints/last.ckpt after training.

Format¶

The format argument specifies the format in which the embeddings are saved. The following formats are supported:

csv: Embeddings are saved as a CSV file with one row per image. The first column contains the filename and the remaining columns contain the embedding values. An example CSV file looks like this:
```
  filename,embedding_0,embedding_1,embedding_2
  image1.jpg,0.1,0.2,0.3
  image2.jpg,0.4,0.5,0.6
```
lightly_csv: Embeddings are saved as a CSV file with one row per image. The CSV file is compatible with the Lightly Worker. The labels column is always set to 0. An example CSV file looks like this:
```
  filenames,embedding_0,embedding_1,embedding_2,labels
  image1.jpg,0.1,0.2,0.3,0
  image2.jpg,0.4,0.5,0.6,0
```

torch: The embeddings are saved as dictionary in a torch file with the following structure:

from torch import Tensor

{
    "embeddings": Tensor,    # Embeddings with shape (N, D) where N is the number
                                   # of images and D is the embedding dimension.
    "filenames": list[str],        # List of N filenames corresponding to the embeddings.
}

The embeddings can be loaded with:

  import torch

  embeddings = torch.load("my_embeddings.pth")
  embeddings["embeddings"]    # Embeddings
  embeddings["filenames"]     # Image filenames