Choose Diverse Objects from Comma10k Dataset using Detectron2 predictions

In this tutorial, you will perform a selection of images in your dataset based on the diversity of objects in the images. You will apply the concepts from the object diversity strategy in a more concrete example.

You will learn the following:


To upload predictions to a Lightly datasource, you will need the following things:

pip install lightly
  • A configured datasource with predictions. You can find a tutorial on how to do that under Add Predictions. This tutorial is intended as an extension of that tutorial.

Start the Worker

Start the Lightly Worker in waiting mode. In this mode, the worker will poll the Lightly API for new runs to process.

docker run --shm-size="1024m" --gpus all --rm -it \
    lightly/worker:latest \

Set Up the Dataset and Link It to the Datasource

To set up your dataset, you can follow the page Set Up Your First Dataset. If you followed all the prerequisites, you should already have a datasource with predictions in your preferred cloud infrastructure. In this tutorial, AWS S3 is used. You can create a dataset with the lightly Python client using this script:

from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform. We name it comma10k for continuity
# with the tutorial "Adding Predictions"
client.create_dataset(dataset_name="comma10k", dataset_type=DatasetType.IMAGES)
dataset_id = client.dataset_id

After creating the dataset, you can configure the datasource for it:

from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Input datasource.
# Configure the Lightly datasource.

Use the Object Diversity Selection Strategy

Now that everything is in place, you must configure and start the object diversity run. The code below will train a self-supervised model on the input crops coming from the predictions. It will then embed the crops and run the selection algorithm. The selection algorithm selects images based on the object crop embeddings. Images with many different objects will be preferred over images with similar or no objects. You can adjust the number of selected images as needed by changing the n_samples parameter:

scheduled_run_id = client.schedule_compute_worker_run(
        "n_samples": 100,
        "strategies": [
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "object_detection_comma10k",
                "strategy": {
                    "type": "DIVERSITY",

You can find more information about the different selection strategies in Customize a Selection.

Monitor the Run and Download the Results

The Lightly Worker will pick up the run and start processing it within a few seconds. The status of the current run and other scheduled runs can be seen in the runs view of the Lightly Platform. Alternatively, you can also monitor it from Python:

# You can use this code to track and print the state of the Lightly Worker.
# The loop will end once the run has finished, was canceled, or failed.
for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
    print(f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'")

if run_info.ended_successfully():

Once the run completes, all selected images are uploaded to the comma10k dataset on the Lightly Platform. You can find the dataset in the datasets view. Additionally, all object crops from the selected images are uploaded to a new dataset named comma10k-crops-object_detection_comma10k that you can also find in the datasets view.

If you navigate to the embedding view in your dataset with the object crops and color the embeddings by Category Name, you should see a similar plot to this:


Embeddings colored by category name.

Lightly puts the essential information about the selection process into an automatically generated PDF report to make it easier for you to understand your dataset before and after the selection. You can download it for all completed worker runs from the runs page in the Lightly Platform, or you can use this script to download it with the Lightly Python client:

# Get the scheduled run given its id.
run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
# Download the report as pdf and json files.
client.download_compute_worker_run_report_pdf(run=run, output_path="my_run/artifacts/report.pdf")
client.download_compute_worker_run_report_json(run=run, output_path="my_run/artifacts/report.json")

The report shows, for example, the distribution of the objects in your dataset before and after the selection:


Change in object distribution before and after the selection.

Congratulations, you have made your first selection based on object diversity!

Source Code

You can download the complete source code here.