Custom Embeddings

Lightly allows you to fully customize the embeddings of your images. This can be useful if you have a special image type that requires a different embedding model than the one Lightly provides in the Lightly Worker.

🚧

Lighty Worker Version Compatibility

This feature requires a Lightly Worker of version 2.3.15 (released Dec 20 2022) or newer.

Let's assume the following structure for the Input datasource:

s3://bucket/input/
β”œβ”€β”€ image_1.png
└── subdir/
    β”œβ”€β”€ image_2.png
    └── image_3.png

To provide the embeddings to the Lightly Worker they have to be stored as a CSV file to the Lightly datasource in the .lightly/embeddings/ directory:

s3://bucket/lightly/
└── .lightly/
    └── embeddings/
        └── custom_embeddings.csv

The embedding CSV file must have the following format:

filenames,embedding_0,embedding_1,...,embedding_31,labels
image_1.jpg,-0.86,0.49,...,0
subdir/image_2.jpg,0.86,0.78,...,0
subdir/image_3.jpg,-1.09,-0.93,...,0

The entries in the filenames column must match the image filenames in the Input datasource. Every embedding dimension is stored as a separate column (embedding_0, embedding_1, ..., embedding_31) and last column of the embedding file must be named labels and contain all 0 entries.

❗️

Number of Embedding Dimensions

The number of embedding dimensions can be customized by removing/adding more embedding columns in the CSV file. The number of dimensions must match the num_ftrs option in the lightly config (see below).

To schedule a run with custom embeddings, the location of the embedding file has to be passed in the worker config. The path of the embedding file must be relative to the .lightly/embeddings/ directory in the Lightly datasource.

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

client.schedule_compute_worker_run(
    worker_config={
        "embeddings": "custom_embeddings.csv",
    },
    selection_config={
        "n_samples": 50,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY"
                }
            }
        ]
    },
    lightly_config={
        "model": {
            "num_ftrs": 32,  #Β Must match number of embedding dimensions.
        }
    },
)