Examples and Use Cases
Following, we demonstrate examples and common use cases on how one can combine different inputs with selection strategies to achieve your objectives.
For more information on how to run these examples and use cases on your own data, please follow the first steps on how to customize a selection.
Here are working examples of full configuration for common use cases:
- Visual Diversity
- Selection of Typical Images
- Active Learning
- Visual Diversity and Active Learning
- Metadata Thresholding
- Class Balancing with Target
- Metadata Balancing with Target
- Uniform Video Metadata Balancing with Strength
- Class Balancing According to Input Distribution
- Similarity Search
- Object Diversity
- Object Typicality
- Random Selection
Visual Diversity
Choosing 100 samples that are visually diverse equals diversifying samples based on their embeddings:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
}
]
}
Selection of Typical Images
Choosing 100 images that are typical of the distribution, corresponds to performing Typicality selection on the embeddings of the images:
We strongly recommend to always combine typicality together with diversity as typicality alone can result in selecting images only from a single high density cluster
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {"type": "EMBEDDINGS"},
"strategy": {"type": "DIVERSITY"},
"strength": 1.0
},
{
"input": {"type": "EMBEDDINGS"},
"strategy": {"type": "TYPICALITY"},
"strength": 1.0
},
]
}
Active Learning
Active Learning equals weighting samples based on active learning scores:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "SCORES",
"task": "my_object_detection_task", # change to your task
"score": "uncertainty_entropy" # change to your preferred score
},
"strategy": {
"type": "WEIGHTS"
}
}
]
}
This works as well for Image Classification or Segmentation! Just change the input
task
to a classification or segmentation task.
Visual Diversity and Active Learning
For combining two strategies, just specify both of them:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
},
{
"input": {
"type": "SCORES",
"task": "my_object_detection_task", # change to your task
"score": "uncertainty_entropy" # change to your preferred score
},
"strategy": {
"type": "WEIGHTS"
}
}
]
}
Metadata Thresholding
This can be used to remove e.g. blurry images, which equals selecting samples whose sharpness is above a threshold:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "lightly.sharpness"
},
"strategy": {
"type": "THRESHOLD",
"threshold": 20,
"operation": "BIGGER"
}
}
]
}
Another use case is to remove images with many uniform rows which can filter out images with decoding artifacts. The following configuration keeps images with less than 2.5% of uniform rows.
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "lightly.uniformRowRatio"
},
"strategy": {
"type": "THRESHOLD",
"threshold": 0.025,
"operation": "SMALLER"
}
}
]
}
Class Balancing with Target
Use lightly pretagging to get the predictions, then specify a target distribution of classes:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "PREDICTIONS",
"task": "lightly_pretagging", # (optional) change to your task
"name": "CLASS_DISTRIBUTION"
},
"strategy": {
"type": "BALANCE",
"distribution": "TARGET", # only needed for LightlyOne Worker version >= 2.12
"target": {
"car": 0.1,
"bicycle": 0.5,
"bus": 0.1,
"motorcycle": 0.1,
"person": 0.1,
"train": 0.05,
"truck": 0.05
}
}
}
]
}
To use the
lightly_pretagging
task you need to enable it by settingpretagging
toTrue
in the worker config. See Lightly Pretagging for details.
Metadata Balancing with Target
Let's assume you have specified metadata with the path weather.description
and want your selected subset to have 20% sunny, 40% cloudy, and the rest of other images:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "weather.description"
},
"strategy": {
"type": "BALANCE",
"distribution": "TARGET", # only needed for LightlyOne Worker version >= 2.12
"target": {
"sunny": 0.2,
"cloudy": 0.4
}
}
}
]
}
Uniform Video Metadata Balancing with Strength
Using the optional strength parameter for a strategy, we can enforce balancing across video metadata:
{
"n_samples": 100, # put your number here
"strategies": [
# Select the same number of frames from every video by setting a high strength.
{
"input": {
"type": "METADATA",
"key": "video_name",
},
"strategy": {
"type": "BALANCE",
"distribution": "UNIFORM", # only supported with LightlyOne Worker version >= 2.12
"strength": float(1e9),
}
},
# Within the same video, select the most diverse frames by setting a low strength
# to give this strategy less importance than the balancing strategy.
{
"input": {
"type": "EMBEDDINGS",
},
"strategy": {
"type": "DIVERSITY",
"strength": 1.0,
}
}
]
}
Class Balancing According to Input Distribution
Let us assume that you have configured a task named "object-detection" whose predictions have a percentage of appearance of 20% cats and 80 % dogs. In the case where you would like to select 100 samples while preserving the percentages of class appearances, you can use the Balance strategy as follows:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "PREDICTIONS",
"task": "object-detection",
"name": "CLASS_DISTRIBUTION"
},
"strategy": {
"type": "BALANCE",
"distribution": "INPUT" # only supported with LightlyOne Worker version >= 2.12
}
}
]
}
Similarity Search
To perform similarity search you need a dataset and tag consisting of the query images.
We can then use the following configuration to find similar images from the input dataset. This example will select 100 images from the input dataset that are the most similar to the images in the tag from the query dataset.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "EMBEDDINGS",
"dataset_id": "DATASET_ID_OF_THE_QUERY_IMAGES",
"tag_name": "TAG_NAME_OF_THE_QUERY_IMAGES" # e.g. "initial-tag"
},
"strategy": {
"type": "SIMILARITY",
}
}
]
}
For a more in-depth example, see our Tutorial: Use Similarity Search to Find Similar Samples
Object Diversity
To select images with diverse objects on them you can use a diversity strategy with object embeddings. With this setup, after selection, the objects can be inspected in LightlyOne Platform.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "EMBEDDINGS",
"task": "my_object_detections", # or "lightly_pretagging"
},
"strategy": {
"type": "DIVERSITY",
}
}
]
}
Object Typicality
In order to select images with objects that are typical samples of a distribution you can use a typicality strategy with object embeddings.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "EMBEDDINGS",
"task": "my_object_detections", # or "lightly_pretagging"
},
"strategy": {
"type": "TYPICALITY",
}
}
]
}
Random Selection
You can combine a random input with the strategy weights. As the only strategy, this chooses random samples and can be used e.g., for benchmarking. Combining it with other strategies can soften their decision boundary and lead to more inliers / common cases being chosen.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "RANDOM",
"random_seed": 42, # optional, for reproducibility
},
"strategy": {
"type": "WEIGHTS",
}
}
]
}
Updated about 2 months ago