Selection Strategies
There are different selection strategies that Lightly supports for handling the different inputs to achieve your objectives. When combining multiple objectives and different selection inputs with strategies, one can specify the strength for each of the strategies to influence the objective with a positive or inverse/negative effect.
Lightly offers the following selection strategies to help you achieve your objective:
Not all strategies can be combined with every selection input. Please see input and strategy combinations for detailed information.
Diversity
Use this strategy to select samples such that they are as different as possible from each other.
Can be used with Embeddings. Samples with a high distance between their embeddings are considered to be more different from each other than samples with a low distance. The strategy is specified like this:
"strategy": {
"type": "DIVERSITY",
"strength": 0.6 # optional
}
If you want to preserve a minimum distance between chosen samples, you can specify it as an additional stopping condition. The selection process will stop as soon as one of the stopping criteria has been reached.
"strategy": {
"type": "DIVERSITY",
"stopping_condition_minimum_distance": 0.2,
"strength": 0.6 # optional
}
Setting "stopping_condition_minimum_distance": 0.2
will remove all samples which are closer to each other than 0.2. This allows you to specify the minimum allowed distance between two images in the output dataset. If you use embeddings as input, this value should be between 0 and 2.0, as the embeddings are normalized to unit length. This is often a convenient method when working with different data sources and trying to combine them in a balanced way. If you want to use this stopping condition to stop the selection early, make sure that you allow selecting enough samples by setting n_samples
or proportion_samples
high enough in the selection configuration.
Higher minimum distance in the embedding space results in more diverse images being selected. Increasing the minimum distance will result in fewer samples being selected.
Weights
The objective of this strategy is to select samples that have a high numerical value.
Can be used with Scores, numerical Metadata and Random inputs. It can be specified with:
"strategy": {
"type": "WEIGHTS",
"strength": 0.6 # optional
}
Threshold
The objective of this strategy is to only select samples that have a numerical value fulfilling a threshold criterion. E.g. they should be bigger or smaller than a certain value.
Can be used with Scores and numerical Metadata inputs. It is specified as follows:
"strategy": {
"type": "THRESHOLD",
"threshold": 20,
"operation": "BIGGER_EQUAL"
}
This will keep all samples whose value (specified by the input) is >= 20 and remove all others. The allowed operations are SMALLER
, SMALLER_EQUAL
, BIGGER
, BIGGER_EQUAL
.
Threshold does not support strength
as it is a hard filter that is applied as a first step. See Selection Algorithm for more information.
Balance
The objective of this strategy is to select samples such that the distribution of classes in them is as close to a target distribution as possible.
E.g. the samples chosen should have 50% sunny and 50% rainy weather. Or, the objects of the samples chosen should be 40% ambulances and 60% buses.
Can be used with Predictions and categorical string Metadata. Categorical int and categorical boolean metadata cannot be used for selection a the moment.
"strategy": {
"type": "BALANCE",
"target": {
"Ambulance": 0.4, # `Ambulance` should be a valid class in your `schema.json`
"Bus": 0.6
},
"strength": 0.6 # optional
}
If the values of the target do not sum up to 1, the remainder is assumed to be the target for the other classes. For example, if we would set the target to 20% ambulance and 40% bus, there is the implicit assumption, that the remaining 40% should come from any other class, e.g. from cars, bicycles or pedestrians.
Note that not specified classes do not influence the selection process!
Similarity
With this strategy, you can use the input embeddings from another dataset to select similar images. This can be useful if you are looking for more examples of certain edge cases.
Can be used with Embeddings.
"strategy": {
"type": "SIMILARITY",
"strength": 0.6 # optional
}
Updated 5 days ago