Selection
With the power of LightlyOne, you can select a subset of your unlabeled data stored within your datasource. This allows you to mine your data efficiently based on several objectives you define.
For example, you can specify that the images in the subset should be visually diverse, be images the model struggles with (active learning), should only be sharp images, or have a certain distribution of classes, e.g. be 50% from sunny, 30% from cloudy and 20% from rainy weather. See further examples and use cases.
Each of these objectives is defined by a pair of settings, the input
and the strategy
:
- The
input
defines which data the objective is defined on. This data is either a scalar number or a vector for each sample in the dataset. See selection input for more information. - The
strategy
defines the objective to apply on the input data. See selection strategies for more information.
LightlyOne allows you to specify several objectives at the same time. The algorithms try to fulfill all objectives simultaneously.
For details on how the different selection strategies are combined, see selection combination.
LightlyOne data selection algorithms support different input types:
- Embeddings computed using our Lightly Framework for self-supervised learning.
- Lightly metadata are metadata of images like the sharpness and are computed out of the images themselves by LightlyOne.
- (Optional) Model predictions such as classifications, object detections, or segmentations.
- (Optional) Custom Metadata can be any additional key-value information you can encode in a JSON file (from numbers to categorical strings) such as weather conditions, temperature, timestamp, location, etc.
Prerequisites
In order to use the selection feature, you need to:
- Start the LightlyOne Worker in worker mode.
- Set up a dataset in the LightlyOne Platform with cloud storage as datasource. See Create a Dataset.
Scheduling a Run
For scheduling a LightlyOne Worker run with a custom selection, you can use the Lightly Python Client and its schedule_compute_worker_run
method. You specify the selection with the selection_config
argument. See Run Your First Selection for reference.
Here is an example of scheduling a LightlyOne Worker run with a selection configuration:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
# Schedule the compute run using a custom config.
# You can edit the values according to your needs.
scheduled_run_id = client.schedule_compute_worker_run(
selection_config={
"n_samples": 50,
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
}
]
},
)
Selection Configuration
The configuration of a selection needs to specify both the maximum number of samples to select and the strategies:
{
"n_samples": 50,
"proportion_samples": 0.1,
"strategies": [
{
"input": {
"type": ...
},
"strategy": {
"type": ...
}
},
... more strategies
]
}
The variable n_samples
must be a positive integer specifying the absolute number of samples that should be selected. Alternatively to n_samples
, you can also set proportion_samples
to set the number of samples to be selected relative to the input dataset size. E.g. set it to 0.1
to select 10% of all samples. Please set either one or the other. Setting both or none of them will cause an error.
Each strategy is specified by a dictionary
, which is always made up of an input
and the actual strategy
.
{
"input": {
"type": ...
},
"strategy": {
"type": ...
}
},
Updated 2 months ago