Meta Information

Warning

The Docker Archive documentation is deprecated

The old workflow described in these docs will not be supported with new Lightly Worker versions above 2.6. Please switch to our new documentation page instead.

Depending on your current setup one of the following topics might interest you:

  • You have a dataset but want lightly to “ignore” certain Samples.
  • You have an existing dataset and want to add only relevant new data.
  • You have your own (weak) labels. Can lightly use this information to improve the selection?

Mask Samples

You can also add masking information to prevent certain samples from being used to the .csv file.

The following example shows a dataset in which the column “masked” is used to prevent Lightly Docker from using this specific sample. In this example, img-1.jpg is simply ignored and not considered for selection. E.g. the sample neither gets selected nor is it affecting the selection of any other sample.

masked_embeddings.csv

filenames

embedding_0

embedding_1

masked

labels

img-1.jpg

0.1

0.5

1

0

img-2.jpg

0.2

0.2

0

0

img-3.jpg

0.1

0.9

0

0

Use Pre-Selected Samples

Very similar to masking samples we can also pre-select specific samples. This can be useful for semi-automated data selection processes. A human annotator can pre-select some of the relevant samples and let Lightly Docker add only additional samples that are enriching the existing selection.

selected_embeddings.csv

filenames

embedding_0

embedding_1

selected

labels

img-1.jpg

0.1

0.5

0

0

img-2.jpg

0.2

0.2

0

0

img-3.jpg

0.1

0.9

1

0

Note

Pre-selected samples also count for the target number of samples. For example, you have a dataset with 100 samples. If you have preselected 60 samples and want to select another 10, you have to set the target number of samples to 70.

Custom Weak Labels

You can always add custom embeddings to the dataset by following the guide here: lightly-custom-labels