Custom Embeddings
Lightly allows you to fully customize the embeddings of your images. This can be useful if you have a special image type that requires a different embedding model than the one Lightly provides in the Lightly Worker.
Lighty Worker Version Compatibility
This feature requires a Lightly Worker of version
2.3.15
(released Dec 20 2022) or newer.
Let's assume the following structure for the Input datasource:
s3://bucket/input/
βββ image_1.png
βββ subdir/
βββ image_2.png
βββ image_3.png
To provide the embeddings to the Lightly Worker they have to be stored as a CSV file to the Lightly datasource in the .lightly/embeddings/
directory:
s3://bucket/lightly/
βββ .lightly/
βββ embeddings/
βββ custom_embeddings.csv
The embedding CSV file must have the following format:
filenames,embedding_0,embedding_1,...,embedding_31,labels
image_1.jpg,-0.86,0.49,...,0
subdir/image_2.jpg,0.86,0.78,...,0
subdir/image_3.jpg,-1.09,-0.93,...,0
The entries in the filenames
column must match the image filenames in the Input datasource. Every embedding dimension is stored as a separate column (embedding_0
, embedding_1
, ..., embedding_31
) and last column of the embedding file must be named labels
and contain all 0 entries.
Number of Embedding Dimensions
The number of embedding dimensions can be customized by removing/adding more embedding columns in the CSV file. The number of dimensions must match the
num_ftrs
option in the lightly config (see below).
To schedule a run with custom embeddings, the location of the embedding file has to be passed in the worker config. The path of the embedding file must be relative to the .lightly/embeddings/
directory in the Lightly datasource.
from lightly.api import ApiWorkflowClient
# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
client.schedule_compute_worker_run(
worker_config={
"embeddings": "custom_embeddings.csv",
},
selection_config={
"n_samples": 50,
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
}
]
},
lightly_config={
"model": {
"num_ftrs": 32, #Β Must match number of embedding dimensions.
}
},
)
Updated 3 months ago