Examples and Use Cases

Following, we demonstrate examples and common use cases on how one can combine different inputs with selection strategies to achieve your objectives.

For more information on how to run these examples and use cases on your own data, please follow the first steps on how to customize a selection.

Here are working examples of full configuration for common use cases:

Visual Diversity
Selection of Typical Images
Active Learning
Visual Diversity and Active Learning
Metadata Thresholding
Class Balancing with Target
Metadata Balancing with Target
Uniform Video Metadata Balancing with Strength
Class Balancing According to Input Distribution
Similarity Search
Object Diversity
Object Typicality
Random Selection

Visual Diversity

Choosing 100 samples that are visually diverse equals diversifying samples based on their embeddings:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS"
            },
            "strategy": {
                "type": "DIVERSITY"
            }
        }
    ]
}

Selection of Typical Images

When the machine learning model only has a few samples to learn from and/or is already struggling with the easy or common cases, typicality selection helps. The example below choses 100 typical images by using the Typicality strategy on image embeddings.

🚧
You should always combine typicality with diversity as typicality alone can result in selecting images only from a single high density cluster. Furthermore, we strongly discourage using typicality for datasets with more than 100,000 input samples. For large datasets, it not only does not help selection, but also leads to long worker runtimes.

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {"type": "EMBEDDINGS"},
            "strategy": {"type": "DIVERSITY"},
            "strength": 1.0 
        },
        {
            "input": {"type": "EMBEDDINGS"},
            "strategy": {"type": "TYPICALITY"},
            "strength": 1.0
        },
    ]
}

Active Learning

Active Learning equals weighting samples based on active learning scores:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "SCORES",
                "task": "my_object_detection_task", # change to your task
                "score": "uncertainty_entropy" # change to your preferred score
            },
            "strategy": {
                "type": "WEIGHTS"
            }
        }
    ]
}

📘
This works as well for Image Classification or Segmentation! Just change the input task to a classification or segmentation task.

Visual Diversity and Active Learning

For combining two strategies, just specify both of them:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS"
            },
            "strategy": {
                "type": "DIVERSITY"
            }
        },
        {
            "input": {
                "type": "SCORES",
                "task": "my_object_detection_task", # change to your task
                "score": "uncertainty_entropy" # change to your preferred score
            },
            "strategy": {
                "type": "WEIGHTS"
            }
        }
    ]
}

Metadata Thresholding

This can be used to remove e.g. blurry images, which equals selecting samples whose sharpness is above a threshold:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "lightly.sharpness"
            },
            "strategy": {
                "type": "THRESHOLD",
                "threshold": 20,
                "operation": "BIGGER"
            }
        }
    ]
}

Another use case is to remove images with many uniform rows which can filter out images with decoding artifacts. The following configuration keeps images with less than 2.5% of uniform rows.

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "lightly.uniformRowRatio"
            },
            "strategy": {
                "type": "THRESHOLD",
                "threshold": 0.025,
                "operation": "SMALLER"
            }
        }
    ]
}

Class Balancing with Target

Use lightly pretagging to get the predictions, then specify a target distribution of classes:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "PREDICTIONS",
                "task": "lightly_pretagging", # (optional) change to your task
                "name": "CLASS_DISTRIBUTION"
            },
            "strategy": {
                "type": "BALANCE",
                "distribution": "TARGET", # only needed for LightlyOne Worker version >= 2.12
                "target": {
                    "car": 0.1,
                    "bicycle": 0.5,
                    "bus": 0.1,
                    "motorcycle": 0.1,
                    "person": 0.1,
                    "train": 0.05,
                    "truck": 0.05
                }
            }
        }
    ]
}

📘
To use the lightly_pretagging task you need to enable it by setting pretagging to True in the worker config. See Lightly Pretagging for details.

Metadata Balancing with Target

Let's assume you have specified metadata with the path weather.description and want your selected subset to have 20% sunny, 40% cloudy, and the rest of other images:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "weather.description"
            },
            "strategy": {
                "type": "BALANCE",
                "distribution": "TARGET", # only needed for LightlyOne Worker version >= 2.12
                "target": {
                    "sunny": 0.2,
                    "cloudy": 0.4
                }
            }
        }
    ]
}

Uniform Video Metadata Balancing with Strength

Using the optional strength parameter for a strategy, we can enforce balancing across video metadata:

{
    "n_samples": 100, # put your number here
    "strategies": [
        # Select the same number of frames from every video by setting a high strength.
        { 
            "input": {
                "type": "METADATA",
                "key": "video_name",
            },
            "strategy": {
                "type": "BALANCE",
                "distribution": "UNIFORM", # only supported with LightlyOne Worker version >= 2.12
                "strength": float(1e9),
            }
        },
        # Within the same video, select the most diverse frames by setting a low strength
        # to give this strategy less importance than the balancing strategy.
        {
            "input": {
	              "type": "EMBEDDINGS",
            },
            "strategy": {
                "type": "DIVERSITY",
                "strength": 1.0,
            }
        }
    ]
}

Class Balancing According to Input Distribution

Let us assume that you have configured a task named "object-detection" whose predictions have a percentage of appearance of 20% cats and 80 % dogs. In the case where you would like to select 100 samples while preserving the percentages of class appearances, you can use the Balance strategy as follows:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "PREDICTIONS",
                "task": "object-detection", 
                "name": "CLASS_DISTRIBUTION"
            },
            "strategy": {
                "type": "BALANCE",
                "distribution": "INPUT" # only supported with LightlyOne Worker version >= 2.12
            }
        }
    ]
}

Similarity Search

To perform similarity search you need a dataset and tag consisting of the query images.

We can then use the following configuration to find similar images from the input dataset. This example will select 100 images from the input dataset that are the most similar to the images in the tag from the query dataset.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "dataset_id": "DATASET_ID_OF_THE_QUERY_IMAGES", 
                "tag_name": "TAG_NAME_OF_THE_QUERY_IMAGES" # e.g. "initial-tag"
            },
            "strategy": {
                "type": "SIMILARITY",
            }
        }
    ]
}

For a more in-depth example, see our Tutorial: Use Similarity Search to Find Similar Samples

Object Diversity

To select images with diverse objects on them you can use a diversity strategy with object embeddings. With this setup, after selection, the objects can be inspected in LightlyOne Platform.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "my_object_detections", # or "lightly_pretagging"
            },
            "strategy": {
                "type": "DIVERSITY",
            }
        }
    ]
}

Object Typicality

In order to select images with objects that are typical samples of a distribution you can use a typicality strategy with object embeddings.

🚧
You should always combine typicality with diversity as typicality alone can result in selecting images only from a single high density cluster. Furthermore, we strongly discourage using typicality for datasets with more than 100,000 input samples. For large datasets, it not only does not help selection, but also leads to long worker runtimes.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "my_object_detections", # or "lightly_pretagging"
            },
            "strategy": {
                "type": "TYPICALITY",
            }
        }
    ]
}

Random Selection

You can combine a random input with the strategy weights. As the only strategy, this chooses random samples and can be used e.g., for benchmarking. Combining it with other strategies can soften their decision boundary and lead to more inliers / common cases being chosen.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "RANDOM",
                "random_seed": 42, # optional, for reproducibility
            },
            "strategy": {
                "type": "WEIGHTS",
            }
        }
    ]
}