Meta Information
Warning
The Docker Archive documentation is deprecated
The old workflow described in these docs will not be supported with new Lightly Worker versions above 2.6. Please switch to our new documentation page instead.
Depending on your current setup one of the following topics might interest you:
- You have a dataset but want lightly to “ignore” certain Samples.–> Mask Samples
- You have an existing dataset and want to add only relevant new data.
- You have your own (weak) labels. Can lightly use this information to improve the selection?
Mask Samples
You can also add masking information to prevent certain samples from being used to the .csv file.
The following example shows a dataset in which the column “masked” is used to prevent Lightly Docker from using this specific sample. In this example, img-1.jpg is simply ignored and not considered for selection. E.g. the sample neither gets selected nor is it affecting the selection of any other sample.
filenames |
embedding_0 |
embedding_1 |
masked |
labels |
---|---|---|---|---|
img-1.jpg |
0.1 |
0.5 |
1 |
0 |
img-2.jpg |
0.2 |
0.2 |
0 |
0 |
img-3.jpg |
0.1 |
0.9 |
0 |
0 |
Use Pre-Selected Samples
Very similar to masking samples we can also pre-select specific samples. This can be useful for semi-automated data selection processes. A human annotator can pre-select some of the relevant samples and let Lightly Docker add only additional samples that are enriching the existing selection.
filenames |
embedding_0 |
embedding_1 |
selected |
labels |
---|---|---|---|---|
img-1.jpg |
0.1 |
0.5 |
0 |
0 |
img-2.jpg |
0.2 |
0.2 |
0 |
0 |
img-3.jpg |
0.1 |
0.9 |
1 |
0 |
Note
Pre-selected samples also count for the target number of samples. For example, you have a dataset with 100 samples. If you have preselected 60 samples and want to select another 10, you have to set the target number of samples to 70.
Custom Weak Labels
You can always add custom embeddings to the dataset by following the guide here: lightly-custom-labels