Examples and Use Cases
Following, we demonstrate examples and common use cases on how one can combine different inputs with selection strategies to achieve your objectives.
For more information on how to run these examples and use cases on your own data, please follow the first steps on how to customize a selection.
Here are working examples of full configuration for common use cases:
- Visual Diversity
- Active Learning
- Visual Diversity and Active Learning
- Metadata Thresholding
- Object Balancing
- Metadata Balancing
- Video Metadata Balancing with Strength
- Similarity Search
- Object Diversity
- Random Selection
Visual Diversity
Choosing 100 samples that are visually diverse equals diversifying samples based on their embeddings:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
}
]
}
Active Learning
Active Learning equals weighting samples based on active learning scores:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "SCORES",
"task": "my_object_detection_task", # change to your task
"score": "uncertainty_entropy" # change to your preferred score
},
"strategy": {
"type": "WEIGHTS"
}
}
]
}
This works as well for Image Classification or Segmentation! Just change the input
task
to a classification or segmentation task.
Visual Diversity and Active Learning
For combining two strategies, just specify both of them:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
},
{
"input": {
"type": "SCORES",
"task": "my_object_detection_task", # change to your task
"score": "uncertainty_entropy" # change to your preferred score
},
"strategy": {
"type": "WEIGHTS"
}
}
]
}
Metadata Thresholding
This can be used to remove e.g. blurry images, which equals selecting samples whose sharpness is above a threshold:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "lightly.sharpness"
},
"strategy": {
"type": "THRESHOLD",
"threshold": 20,
"operation": "BIGGER"
}
}
]
}
Another use case is to remove images with many uniform rows which can filter out images with decoding artifacts. The following configuration keeps images with less than 2.5% of uniform rows.
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "lightly.uniformRowRatio"
},
"strategy": {
"type": "THRESHOLD",
"threshold": 0.025,
"operation": "SMALLER"
}
}
]
}
Object Balancing
Use lightly pretagging to get the objects, then specify a target distribution of classes:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "PREDICTIONS",
"task": "lightly_pretagging", # (optional) change to your task
"name": "CLASS_DISTRIBUTION"
},
"strategy": {
"type": "BALANCE",
"target": {
"car": 0.1,
"bicycle": 0.5,
"bus": 0.1,
"motorcycle": 0.1,
"person": 0.1,
"train": 0.05,
"truck": 0.05
}
}
}
]
}
To use the
lightly_pretagging
task you need to enable it by settingpretagging
toTrue
in the worker config. See Lightly Pretagging for details.
Metadata Balancing
Let's assume you have specified metadata with the path weather.description
and want your selected subset to have 20% sunny, 40% cloudy and the rest of other images:
{
"n_samples": 100, # set to the number of samples you want to select
"strategies": [
{
"input": {
"type": "METADATA",
"key": "weather.description"
},
"strategy": {
"type": "BALANCE",
"target": {
"sunny": 0.2,
"cloudy": 0.4
}
}
}
]
}
Video Metadata Balancing with Strength
Using the optional strength parameter for a strategy, we can enforce balancing across video metadata:
{
"n_samples": 100, # put your number here
"strategies": [
// Select the same number of frames from every video by setting a high strength.
{
"input": {
"type": "METADATA",
"key": "video_name",
},
"strategy": {
"type": "BALANCE",
"target": {
video_name: 1/len(videos) for video_name in videos
},
"strength": float(1e9),
}
},
// Within the same video, select the most diverse frames by setting a low strength
// to give this strategy less importance than the balancing strategy.
{
"input": {
"type": "EMBEDDINGS",
},
"strategy": {
"type": "DIVERSITY",
"strength": 1.0,
}
}
]
}
Similarity Search
To perform similarity search you need a dataset and tag consisting of the query images.
We can then use the following configuration to find similar images from the input dataset. This example will select 100 images from the input dataset that are the most similar to the images in the tag from the query dataset.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "EMBEDDINGS",
"dataset_id": "DATASET_ID_OF_THE_QUERY_IMAGES",
"tag_name": "TAG_NAME_OF_THE_QUERY_IMAGES" # e.g. "initial-tag"
},
"strategy": {
"type": "SIMILARITY",
}
}
]
}
For a more in-depth example, see our Tutorial: Use Similarity Search to Find Similar Samples
Object Diversity
To select images with diverse objects on them you can use a diversity strategy with object embeddings. With this setup, after selection, the objects can be inspected in Lightly Platform.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "EMBEDDINGS",
"task": "my_object_detections", # or "lightly_pretagging"
},
"strategy": {
"type": "DIVERSITY",
}
}
]
}
Random Selection
You can combine a random input with the strategy weights. As the only strategy, this chooses random samples and can be used e.g., for benchmarking. Combining it with other strategies can soften their decision boundary and lead to more inliers / common cases being chosen.
{
"n_samples": 100, # put your number here
"strategies": [
{
"input": {
"type": "RANDOM",
"random_seed": 42, # optional, for reproducibility
},
"strategy": {
"type": "WEIGHTS",
}
}
]
}
Updated 5 days ago