Tutorial 4: Active Learning using Detectron2 on Comma10k


Tutorial is outdated

This tutorial uses a deprecated workflow of the Lightly Solution and will be removed in the future. Please use the tutorial to do Active Learning Using YOLOv7 and Comma10k instead.

Active learning is a process of using model predictions to find a new set of images to annotate. The images are chosen to have a maximal impact on the model performance. In this tutorial, we will use a pre-trained object detection model to do active learning on a completely unlabeled set of images.

Detectron2 Faster RCNN prediction on Comma10k

Detectron2 Faster RCNN prediction on Comma10k

In machine learning, we often don’t train a model from scratch. Instead, we start with an already pre-trained model. For object detection tasks, a common pre-training dataset is MS COCO consisting of over 100’000 images containing 80 different classes. Our goal is to take an MS COCO pre-trained model and optimize it for an autonomous driving task. We will proceed as follows: First, we will use the pre-trained model to make predictions on our task dataset (Comma10k) which has been collected for autonomous driving. Then, we use the predictions, self-supervised learning, and active learning with the lightly framework to find the 100 most informative images on which we can finetune our model.

This tutorial is available as a Google Colab Notebook

In this tutorial you will learn:

  • how to use Lightly Active Learning together with the detectron2 framework for object detection

  • how to use the Lightly Platform to inspect the selected samples

  • how to download the selected samples for labeling

The tutorial will be divided into the following steps.

  1. Installation of detectron2 and lightly

  2. Run predictions using a pre-trained model

  3. Use lightly to compute active learning scores for the predictions

  4. Use the Lightly Platform to understand where our model struggles

  5. Select the most valuable 100 images for annotation


  • Make sure you have OpenCV installed to read and preprocess the images. You can install the framework using the following command:

pip install opencv-python
  • Make sure you have the detectron2 framework installed on your machine. Check out the detectron2 installation documentation

  • In this tutorial, we work with the comma10k dataset. The dataset consists of 10’000 images for autonomous driving and is available here on GitHub We can download the dataset using git clone. We save the dataset locally to /datasets/

git clone https://github.com/commaai/comma10k
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger


import gc
import glob
import json
import os
import random

import cv2
import matplotlib.pyplot as plt

# import some common libraries
import numpy as np
import tqdm

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.data import DatasetCatalog, MetadataCatalog
from detectron2.engine import DefaultPredictor
from detectron2.utils.visualizer import Visualizer

from lightly.active_learning.agents import ActiveLearningAgent
from lightly.active_learning.config import SelectionConfig
from lightly.active_learning.scorers import ScorerObjectDetection

# imports for lightly
from lightly.active_learning.utils.bounding_box import BoundingBox
from lightly.active_learning.utils.object_detection_output import ObjectDetectionOutput
from lightly.api.api_workflow_client import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import SamplingMethod

Upload dataset to Lightly

To work with the Lightly Platform and use the active learning feature we need to upload the dataset.

First, head over to the Lightly Platform and create a new dataset.

We can now upload the data using the command line interface. Replace yourToken and yourDatasetId with the two provided values from the web app. Don’t forget to adjust the input_dir to the location of your dataset.

lightly-magic token="yourToken" dataset_id="yourDatasetId" \
    input_dir='/datasets/comma10k/imgs/' trainer.max_epochs=20 \
    loader.batch_size=64 loader.num_workers=8


In this tutorial, we use the lightly-magic command which trains a model before embedding and uploading it to the Lightly Platform. To skip training, you can set trainer.max_epochs=0.

YOUR_TOKEN = "yourToken"  # your token of the web platform
YOUR_DATASET_ID = "yourDatasetId"  # the id of your dataset on the web platform
DATASET_ROOT = "/datasets/comma10k/imgs/"

# allow setting of token and dataset_id from environment variables
def try_get_token_and_id_from_env():
    token = os.getenv("LIGHTLY_TOKEN", YOUR_TOKEN)
    dataset_id = os.getenv("AL_TUTORIAL_DATASET_ID", YOUR_DATASET_ID)
    return token, dataset_id

YOUR_TOKEN, YOUR_DATASET_ID = try_get_token_and_id_from_env()

Inference on unlabeled data

In active learning, we want to pick the new data for which our model struggles the most. If we have an image with a single car in it and our model has high confidence that there is a car we don’t gain a lot by including this example in our training data. However, if we focus on images where the model is not sure whether the object is a car or a building we want to include these images to refine the decision boundary.

First, we need to create an active learning agent in order to provide lightly with the model predictions. We can use the ApiWorkflowClient for this. Make sure that we use the right dataset_id and token.

# create Lightly API client
api_client = ApiWorkflowClient(dataset_id=YOUR_DATASET_ID, token=YOUR_TOKEN)
al_agent = ActiveLearningAgent(api_client)
# we can access the images of the dataset we want to use for active learning using
# the `al_agent.query_set` property

# let's print the first 3 entries
['0001_a23b0de0bc12dcba_2018-06-24--00-29-19_17_79.png', '0002_e8e95b54ed6116a6_2018-09-05--22-04-33_2_608.png', '0007_b5e785c1fc446ed0_2018-06-14--08-27-35_78_873.png']

Note, that our active learning agent already synchronized with the Lightly Platform and knows the filenames present in our dataset.

Let’s verify the length of the query_set. The query_set is the set of images from which we want to query. By default this is our full dataset uploaded to Lightly. You can learn more about the different sets we can access through the active learning agent here lightly.api.api_workflow_client.ApiWorkflowClient

# The length of the `query_set` should match the number of uploaded
# images

Create our Detectron2 model

Next, we create a detectron2 config and a detectron2 DefaultPredictor to run predictions on the new images.

  • We use a pre-trained Faster R-CNN with a ResNet-50 backbone

  • We use an MS COCO pre-trained model from detectron2

cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
###cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
predictor = DefaultPredictor(cfg)

We use this little helper method to overlay the model predictions on a given image.

def predict_and_overlay(model, filename):
    # helper method to run the model on an image and overlay the predictions
    im = cv2.imread(filename)
    out = model(im)
    # We can use `Visualizer` to draw the predictions on the image.
    v = Visualizer(
        im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2
    out = v.draw_instance_predictions(out["instances"].to("cpu"))
    plt.figure(figsize=(16, 12))
    plt.imshow(out.get_image()[:, :, ::-1])

The lightly framework expects a certain bounding box and prediction format. We create another helper method to convert the detectron2 output into the desired format.

def convert_bbox_detectron2lightly(outputs):
    # convert detectron2 predictions into lightly format
    height, width = outputs["instances"].image_size
    boxes = []

    for bbox_raw, score, class_idx in zip(
        x0, y0, x1, y1 = bbox_raw.cpu().numpy()
        x0 /= width
        y0 /= height
        x1 /= width
        y1 /= height

        boxes.append(BoundingBox(x0, y0, x1, y1))
    output = ObjectDetectionOutput.from_scores(
    return output

Get Model Predictions

We now use the created model and iterate over the query_set and make predictions. It’s important that the predictions are in the same order as the filenames in the query_set. Otherwise, we could upload a prediction to the wrong sample!

obj_detection_outputs = []
pbar = tqdm.tqdm(al_agent.query_set, miniters=500, mininterval=60, maxinterval=120)
for fname in pbar:
    fname_full = os.path.join(DATASET_ROOT, fname)
    im = cv2.imread(fname_full)
    out = predictor(im)
    obj_detection_output = convert_bbox_detectron2lightly(out)
  0%|          | 0/9888 [00:00<?, ?it/s]/opt/runner_04/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

  7%|6         | 684/9888 [01:00<13:28, 11.38it/s]
 14%|#4        | 1394/9888 [02:00<12:09, 11.64it/s]
 21%|##1       | 2093/9888 [03:00<11:09, 11.64it/s]
 28%|##8       | 2782/9888 [04:00<10:13, 11.57it/s]
 38%|###8      | 3780/9888 [05:00<07:35, 13.40it/s]
 49%|####9     | 4891/9888 [06:00<05:30, 15.13it/s]
 61%|######    | 5991/9888 [07:00<04:00, 16.18it/s]
 72%|#######1  | 7087/9888 [08:00<02:46, 16.84it/s]
 83%|########2 | 8180/9888 [09:00<01:38, 17.26it/s]
 94%|#########3| 9261/9888 [10:00<00:35, 17.50it/s]
100%|##########| 9888/9888 [10:36<00:00, 15.53it/s]

Now, we need to turn the predictions into scores. The scorer assigns scores between 0.0 and 1.0 to each sample and for each scoring method.

scorer = ScorerObjectDetection(obj_detection_outputs)
scores = scorer.calculate_scores()

Let’s have a look at the sample with the highest uncertainty_margin score.


A high uncertainty margin means that the image contains at least one bounding box for which the model is unsure about the class of the object in the bounding box. Read more about how our active learning scores are calculated here: lightly.active_learning.scorers.detection.ScorerObjectDetection

max_score = scores["uncertainty_margin"].max()
idx = scores["uncertainty_margin"].argmax()
print(f"Highest uncertainty_margin score found for idx {idx}: {max_score}")
Highest uncertainty_margin score found for idx 9393: 0.9756525754928589

Let’s have a look at this particular image and show the model prediction for it.

fname = os.path.join(DATASET_ROOT, al_agent.query_set[idx])
predict_and_overlay(predictor, fname)
tutorial active learning detectron2

Query Samples for Labeling

Finally, we can tell our agent to select the top 100 images to annotate and improve our existing model. We pick the selection strategy called CORAL which is a combination of CORESET and Active Learning. Whereas CORESET maximizes the image diversity based on the embeddings, active learning aims at selecting images where our model struggles the most.

config = SelectionConfig(
    n_samples=100, method=SamplingMethod.CORAL, name="active-learning-loop-1"
al_agent.query(config, scorer)

We can access the newly added data from the agent.


Let’s have a look at the first 5 entries.

['0012_76c3bc6da8109da7_2018-09-01--13-03-11_31_1101.png', '0067_24d8e3bdd70fc55d_2018-09-01--19-38-32_10_1160.png', '0117_3704c4f2938907df_2018-08-29--17-21-35_21_120.png', '0275_aee00e7a217cbe97_2018-08-05--17-11-18_26_577.png', '0300_da3150e8caf514b6_2018-11-12--21-02-26_42_752.png']

Let’s show model predictions for the first 5 images.

to_label = [os.path.join(DATASET_ROOT, x) for x in al_agent.added_set]
for i in range(5):
    predict_and_overlay(predictor, to_label[i])
  • tutorial active learning detectron2
  • tutorial active learning detectron2
  • tutorial active learning detectron2
  • tutorial active learning detectron2
  • tutorial active learning detectron2

Samples selected in the step above were placed in the ‘active-learning-loop-1’ tag. This can be viewed on the Lightly Platform.

To re-use a dataset without tags from past experiments, we can (optionally!) remove tags other than the initial-tag:

for tag in api_client.get_all_tags():
    if tag.prev_tag_id is not None:

Next Steps

We showed in this tutorial how you can use Lightly Active Learning to discover the images you should label next. You can close the loop by annotating the 100 images and re-training your model. Then start the next iteration by making new model predictions on the query_set.

Using Lightly Active Learning has two advantages:

  • By letting the model chose the next batch of images to label we achieve a higher accuracy faster. We’re only labeling the images having a great impact.

  • By combining the model predictions with the image embeddings we make sure we don’t select many similar images. Imagine the model being very bad at small red cars and the 100 images therefore would only contain this set of images. We might overfit the model because it suddenly has too many training examples of small red cars.

After re-training our model on the newly labeled 100 images we can do another active learning iteration by running predictions on the the query_set.

Total running time of the script: ( 11 minutes 37.446 seconds)

Gallery generated by Sphinx-Gallery