(embed)= # Embed The `embed` command is used to embed images with a [training checkpoint](#train-output). An example command looks like this: ````{tab} Python ```python import lightly_train if __name__ == "__main__": lightly_train.train( out="out/my_experiment", data="my_data_dir", model="torchvision/resnet50", ) lightly_train.embed( out="my_embeddings.pth", data="my_data_dir", checkpoint="out/my_experiment/checkpoints/last.ckpt", format="torch", ) ```` ````{tab} Command Line ```bash lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" lightly-train embed out=my_embeddings.pth data=my_data_dir checkpoint=out/my_experiment/checkpoints/last.ckpt format=torch ```` The above code example trains a model and uses the last training checkpoint to generate image embeddings from the images in `"my_data_dir"`. ```{tip} See {meth}`lightly_train.embed` for a complete list of arguments. ``` ## Out The `out` argument specifies the output file where the embeddings and corresponding image filenames are saved. Image filenames are always relative to the `data` directory. See [format](#embedding-format) for details on how the embeddings are saved. The embedding dimension is determined by the model used for training. If the `embed_dim` argument was set during training, the embeddings will have this dimension, otherwise the default embedding dimension of the model is used. ## Data The `data` argument specifies the directory with images to embed. The same image formats as in the [training command](#train-data) are supported. ## Checkpoint The `checkpoint` argument specifies the Lightly**Train** checkpoint to use for embedding. This is the checkpoint saved to `out/my_experiment/checkpoints/last.ckpt` after training. (embedding-format)= ## Format The `format` argument specifies the format in which the embeddings are saved. The following formats are supported: - `csv`: Embeddings are saved as a CSV file with one row per image. The first column contains the filename and the remaining columns contain the embedding values. An example CSV file looks like this: ```text filename,embedding_0,embedding_1,embedding_2 image1.jpg,0.1,0.2,0.3 image2.jpg,0.4,0.5,0.6 ``` - `lightly_csv`: Embeddings are saved as a CSV file with one row per image. The CSV file is compatible with the [Lightly Worker](https://docs.lightly.ai/docs/custom-embeddings). The labels column is always set to 0. An example CSV file looks like this: ```text filenames,embedding_0,embedding_1,embedding_2,labels image1.jpg,0.1,0.2,0.3,0 image2.jpg,0.4,0.5,0.6,0 ``` - `torch`: The embeddings are saved as dictionary in a torch file with the following structure: ```python from torch import Tensor { "embeddings": Tensor, # Embeddings with shape (N, D) where N is the number # of images and D is the embedding dimension. "filenames": list[str], # List of N filenames corresponding to the embeddings. } ``` The embeddings can be loaded with: ```python import torch embeddings = torch.load("my_embeddings.pth", weights_only=True) embeddings["embeddings"] # Embeddings embeddings["filenames"] # Image filenames ```