Examples and Use Cases

Following, we demonstrate examples and common use cases on how one can combine different inputs with selection strategies to achieve your objectives.

For more information on how to run these examples and use cases on your own data, please follow the first steps on how to customize a selection.

Here are working examples of full configuration for common use cases:

Visual Diversity

Choosing 100 samples that are visually diverse equals diversifying samples based on their embeddings:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS"
            },
            "strategy": {
                "type": "DIVERSITY"
            }
        }
    ]
}

Selection of Typical Images

Choosing 100 images that are typical of the distribution, corresponds to performing Typicality selection on the embeddings of the images:

🚧

We strongly recommend to always combine typicality together with diversity as typicality alone can result in selecting images only from a single high density cluster

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {"type": "EMBEDDINGS"},
            "strategy": {"type": "DIVERSITY"},
            "strength": 1.0 
        },
        {
            "input": {"type": "EMBEDDINGS"},
            "strategy": {"type": "TYPICALITY"},
            "strength": 1.0
        },
    ]
}

Active Learning

Active Learning equals weighting samples based on active learning scores:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "SCORES",
                "task": "my_object_detection_task", # change to your task
                "score": "uncertainty_entropy" # change to your preferred score
            },
            "strategy": {
                "type": "WEIGHTS"
            }
        }
    ]
}

📘

This works as well for Image Classification or Segmentation! Just change the input task to a classification or segmentation task.

Visual Diversity and Active Learning

For combining two strategies, just specify both of them:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS"
            },
            "strategy": {
                "type": "DIVERSITY"
            }
        },
        {
            "input": {
                "type": "SCORES",
                "task": "my_object_detection_task", # change to your task
                "score": "uncertainty_entropy" # change to your preferred score
            },
            "strategy": {
                "type": "WEIGHTS"
            }
        }
    ]
}

Metadata Thresholding

This can be used to remove e.g. blurry images, which equals selecting samples whose sharpness is above a threshold:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "lightly.sharpness"
            },
            "strategy": {
                "type": "THRESHOLD",
                "threshold": 20,
                "operation": "BIGGER"
            }
        }
    ]
}

Another use case is to remove images with many uniform rows which can filter out images with decoding artifacts. The following configuration keeps images with less than 2.5% of uniform rows.

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "lightly.uniformRowRatio"
            },
            "strategy": {
                "type": "THRESHOLD",
                "threshold": 0.025,
                "operation": "SMALLER"
            }
        }
    ]
}

Object Balancing

Use lightly pretagging to get the objects, then specify a target distribution of classes:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "PREDICTIONS",
                "task": "lightly_pretagging", # (optional) change to your task
                "name": "CLASS_DISTRIBUTION"
            },
            "strategy": {
                "type": "BALANCE",
                "target": {
                    "car": 0.1,
                    "bicycle": 0.5,
                    "bus": 0.1,
                    "motorcycle": 0.1,
                    "person": 0.1,
                    "train": 0.05,
                    "truck": 0.05
                }
            }
        }
    ]
}

📘

To use the lightly_pretagging task you need to enable it by setting pretagging to True in the worker config. See Lightly Pretagging for details.

Metadata Balancing

Let's assume you have specified metadata with the path weather.description and want your selected subset to have 20% sunny, 40% cloudy and the rest of other images:

{
    "n_samples": 100, # set to the number of samples you want to select
    "strategies": [
        {
            "input": {
                "type": "METADATA",
                "key": "weather.description"
            },
            "strategy": {
                "type": "BALANCE",
                "target": {
                    "sunny": 0.2,
                    "cloudy": 0.4
                }
            }
        }
    ]
}

Video Metadata Balancing with Strength

Using the optional strength parameter for a strategy, we can enforce balancing across video metadata:

{
    "n_samples": 100, # put your number here
    "strategies": [
        // Select the same number of frames from every video by setting a high strength.
        { 
            "input": {
                "type": "METADATA",
                "key": "video_name",
            },
            "strategy": {
                "type": "BALANCE",
                "target": {
	                  video_name: 1/len(videos) for video_name in videos
                },
                "strength": float(1e9),
            }
        },
        // Within the same video, select the most diverse frames by setting a low strength
        // to give this strategy less importance than the balancing strategy.
        {
            "input": {
	              "type": "EMBEDDINGS",
            },
            "strategy": {
                "type": "DIVERSITY",
                "strength": 1.0,
            }
        }
    ]
}

Similarity Search

To perform similarity search you need a dataset and tag consisting of the query images.

We can then use the following configuration to find similar images from the input dataset. This example will select 100 images from the input dataset that are the most similar to the images in the tag from the query dataset.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "dataset_id": "DATASET_ID_OF_THE_QUERY_IMAGES", 
                "tag_name": "TAG_NAME_OF_THE_QUERY_IMAGES" # e.g. "initial-tag"
            },
            "strategy": {
                "type": "SIMILARITY",
            }
        }
    ]
}

For a more in-depth example, see our Tutorial: Use Similarity Search to Find Similar Samples

Object Diversity

To select images with diverse objects on them you can use a diversity strategy with object embeddings. With this setup, after selection, the objects can be inspected in Lightly Platform.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "my_object_detections", # or "lightly_pretagging"
            },
            "strategy": {
                "type": "DIVERSITY",
            }
        }
    ]
}

Object Typicality

In order to select images with objects that are typical samples of a distribution you can use a typicality strategy with object embeddings.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "my_object_detections", # or "lightly_pretagging"
            },
            "strategy": {
                "type": "TYPICALITY",
            }
        }
    ]
}

Random Selection

You can combine a random input with the strategy weights. As the only strategy, this chooses random samples and can be used e.g., for benchmarking. Combining it with other strategies can soften their decision boundary and lead to more inliers / common cases being chosen.

{
    "n_samples": 100, # put your number here
    "strategies": [
        {
            "input": {
                "type": "RANDOM",
                "random_seed": 42, # optional, for reproducibility
            },
            "strategy": {
                "type": "WEIGHTS",
            }
        }
    ]
}