Add Predictions to a Datasource

Lightly can not only use images you provided in a datasource, but also predictions of a ML model on your images. They are used for active learning for selecting images based on the objects in them. Furthermore, object detection predictions can be used running Lightly on object level. By providing the predictions in the datasource, you have full control over them and they scale well to millions of samples. Furthermore, if you add new samples to your datasource, you can simultaneously add their predictions to the datasource. If you already have labels instead of predictions, you can treat them just as predictions and upload them the same way.


Note that working with predictions requires a minimum Lightly Worker of version 2.2. You can check your installed version of the Lightly Worker by running the Sanity Check.

Predictions Folder Structure

In the following, we will outline the format of the predictions required by the Lightly Worker. Everything regarding predictions will take place in a subdirectory of your configured output datasource called .lightly/predictions. The general structure of your input and output bucket will look like this:

    + image_1.png
    + image_2.png
    + ...
    + image_N.png

    + .lightly/predictions/
        + tasks.json
        + task_1/
             + schema.json
             + image_1.json
             + image_N.json
        + task_2/
             + schema.json
             + image_1.json
             + image_N.json

Where each subdirectory corresponds to one prediction task (e.g. a classification task and an object detection task). All of the files are explained in the next sections.

Prediction Tasks

To let Lightly know what kind of prediction tasks you want to work with, Lightly needs to know their names. It’s very easy to let Lightly know which tasks exist: simply add a tasks.json in your output bucket stored at the subdirectory .lightly/predictions/.

The tasks.json file must include a list of your task names which must match name of the subdirectory where your prediction schemas will be located.


Only the task names listed within tasks.json will be considered. Please ensure that the task name corresponds with the location of your prediction schema. This allows you to specify which subfolder are considered by the Lightly Worker.

For example, let’s say we are working with the following folder structure:

    + tasks.json
    + classification_weather/
         + schema.json
    + classification_scenery/
         + schema.json
    + object_detection_people/
        + schema.json
    + semantic_segmentation_cars/
        + schema.json
    + some_directory_containing_irrelevant_things/

we can specify which subfolders contain relevant predictions in the tasks.json:



If you list a subfolder which doesn’t contain a valid schema.json file, the Lightly Worker will report an error! See below how to create a good schema.json file.

Prediction Schema

For Lightly it’s required to store a prediction schema. The schema defines the format of the predictions and helps the Lightly Platform to correctly identify and display classes. It also helps to prevent errors as all predictions which are loaded are validated against this schema.

Every schema must include the type of the predictions for this task. For classification and object detection the prediction schema must also include all the categories and their corresponding ids. For other tasks, such as keypoint detection, it can be useful to store additional information like which keypoints are connected with each other by an edge.

You can provide all this information to Lightly by adding a schema.json to the directory of the respective task. The schema.json file must have a key categories with a corresponding list of categories following the COCO annotation format. It must also have a key task_type indicating the type of the predictions. The task_type must be one of:

  • classification

  • object-detection

  • semantic-segmentation

For example, let’s say we are working with a classification model predicting the weather on an image. The three classes are sunny, clouded, and rainy.

    "task_type": "classification",
    "categories": [
            "id": 0,
            "name": "sunny"
            "id": 1,
            "name": "clouded"
            "id": 2,
            "name": "rainy"

Prediction Files

Lightly requires a single prediction file per image. The file should be a .json following the format defined under Prediction Format and stored in the subdirectory .lightly/predictions/${TASK_NAME} in the storage bucket the dataset was configured with. In order to make sure Lightly can match the predictions to the correct source image, it’s necessary to follow the naming convention:

# filename of the prediction for image FILENAME.EXT

# example: my_image.png, classification

# example: my_subdir/my_image.png, classification

Prediction Files for Videos

When working with videos, Lightly requires a prediction file per frame. Lightly uses a naming convention to identify frames: The filename of a frame consists of the video filename, the video format, and the frame number (padded to the length of the number of frames in the video) separated by hyphens. For example, for a video with 200 frames, the frame number will be padded to length three. For a video with 1000 frames, the frame number will be padded to length four (99 becomes 0099).

# filename of the predictions of the Xth frame of video FILENAME.EXT
# with 200 frames (padding: len(str(200)) = 3)

# example: my_video.mp4, frame 99/200

# example: my_subdir/my_video.mp4, frame 99/200

See Creating Prediction Files for Videos on how to extract video frames and create predictions using ffmpeg or Python.

Prediction Format

Predictions for an image must have a file_name and predictions. Here, file_name serves as a unique identifier to retrieve the image for which the predictions are made and predictions is a list of Prediction Singletons for the corresponding task.

  • probabilities are the per class probabilities of the prediction

  • score is the final prediction score/confidence


Some frameworks only provide the score as the model output. The score is typically calculated during the Non-Max Suppression (NMS) by multiplying the objectness probability with the highest class probability.

But having not only a single score, but also the class probabilities can be valuable information for active learning. For example, an object detection model could have a score of 0.6 and the predicted class is a tree. However, without class probabilities, we cannot know what the prediction margin or entropy is. With the class probabilities we would additionally know whether the model thought that it’s 0.5 tree, 0.4 person and 0.1 car or 0.5 tree, 0.25 person and 0.25 car.

Example classification:

    "file_name": "my_image.png",
    "predictions": [ // classes: [sunny, clouded, rainy]
            "category_id": 0,
            "probabilities": [0.8, 0.1, 0.1]

Example object detection:

    "file_name": "my_image.png",
    "predictions": [ // classes: [person, car]
            "category_id": 0,
            "bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
            "score": 0.8,
            "probabilities": [0.2, 0.8] // optional, sum up to 1.0
            "category_id": 1,
            "bbox": [...],
            "score": 0.9,
            "probabilities": [0.9, 0.1] // optional, sum up to 1.0
            "category_id": 0,
            "bbox": [...],
            "score": 0.5,
            "probabilities": [0.6, 0.4] // optional, sum up to 1.0

Example semantic segmentation:

    "file_name": "my_image.png",
    "predictions": [ // classes: [background, car, tree]
            "category_id": 0,
            "segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
            "score": 0.8,
            "probabilities": [0.15, 0.8, 0.05] // optional, sum up to 1.0
            "category_id": 1,
            "segmentation": [...],
            "score": 0.9,
            "probabilities": [0.02, 0.08, 0.9] // optional, sum up to 1.0

Note: The filename should always be the full path from the root directory.

Prediction Singletons

The prediction singletons closely follow the COCO results format while dropping the image_id. Note the the category_id must be the same as the one defined in the schema and that the probabilities (if provided) must follow the order of the category ids.


For classification, please use the following format:

    "category_id"       : int,
    "probabilities"     : [p0, p1, ..., pN]    // optional, sum up to 1.0

Object Detection:

For detection with bounding boxes, please use the following format:

    "category_id"       : int,
    "bbox"              : [x, y, width, height], // coordinates in pixels
    "score"             : float,
    "probabilities"     : [p0, p1, ..., pN]     // optional, sum up to 1.0

The bounding box format follows the COCO results documentation.


Bounding Box coordinates are pixels measured from the top left image corner.

Semantic Segmentation:

For semantic segmentation, please use the following format:

    "category_id"       : int,
    "segmentation"      : [int, int, ...],  // run length encoded binary segmentation mask
    "score"             : float,
    "probabilities"     : [p0, p1, ..., pN] // optional, sum up to 1.0

Each segmentation prediction contains the binary mask for one category and a corresponding score. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the segment belongs to that category.

Segmentations are defined with binary masks where each pixel is either set to 0 or 1 if it belongs to the background or the object, respectively. The segmentation masks are compressed using run length encoding to reduce file size. Binary segmentation masks can be converted to the required format using the following function:

import numpy as np

def encode(binary_mask):
    """Encodes a (H, W) binary segmentation mask with run length encoding.

    The run length encoding is an array with counts of subsequent 0s and 1s
    in the binary mask. The first value in the array is always the count of
    initial 0s.


        >>> binary_mask = [
        >>>     [0, 0, 1, 1],
        >>>     [0, 1, 1, 1],
        >>>     [0, 0, 0, 1],
        >>> ]
        >>> encode(binary_mask)
        [2, 2, 1, 3, 3, 1]
    flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
    borders = np.nonzero(np.diff(flat))[0]
    rle = np.diff(borders)
    if flat[1]:
        rle = np.concatenate(([0], rle))
    return rle.tolist()

Segmentation models oftentimes output a probability for each pixel and category. Storing such probabilities can quickly result in large file sizes if the input images have a high resolution. To reduce storage requirements, Lightly expects only a single score or probability per segmentation. If you have scores or probabilities for each pixel in the image, you have to first aggregate them into a single score/probability. We recommend to take either the median or mean score/probability over all pixels within the segmentation mask. The example below shows how pixelwise segmentation predictions can be converted to the format required by Lightly.

# Make prediction for a single image. The output is assumed to be a tensor
# with shape (categories, height, width).
segmentation = model(image)

# Most probable object category per pixel.
category = segmentation.argmax(dim=0)

# Convert to lightly predictions.
predictions = []
for category_id in category.unique():
    binary_mask = category == category_id
    median_score = segmentation[category_id, binary_mask].median()
        'category_id': int(category_id),
        'segmentation': encode(binary_mask),
        'score': float(median_score),

prediction = {
    'file_name': 'image_name.png',
    'predictions': predictions,


Support for keypoint detection is coming soon!

Creating the predictions folder from COCO

For creating the predictions folder, we recommend writing a script that takes your predictions and saves them in the format just outlined. You can either save the predictions first on your local machine and then upload them to your datasource or save them directly to your datasource.

As an example, the following script takes an object detection COCO predictions file. It needs the path to the predictions file and the output directory where the .lightly folder should be created as input. Don’t forget to change these 2 parameters at the top of the script.

output_filepath = "/path/to/create/.lightly/dir"
annotation_filepath = "/path/to/_annotations.coco.json"

### Optionally change these parameters
task_name = "my_object_detection_task"
task_type = "object-detection"

import json
import os
from pathlib import Path

# create prediction directory
path_predictions = os.path.join(output_filepath, '.lightly/predictions')
Path(path_predictions).mkdir(exist_ok=True, parents=True)

# Create task.json
path_task_json = os.path.join(path_predictions, 'tasks.json')
tasks = [task_name]
with open(path_task_json, 'w') as f:
    json.dump(tasks, f)

# read coco annotations
with open(annotation_filepath, 'r') as f:
    coco_dict = json.load(f)

# Create schema.json for task
path_predictions_task = os.path.join(path_predictions, tasks[0])
schema = {
    "task_type": task_type,
    "categories": coco_dict['categories']
path_schema_json = os.path.join(path_predictions_task, 'schema.json')
with open(path_schema_json, 'w') as f:
    json.dump(schema, f)

# Create predictions themselves
image_id_to_prediction = dict()
for image in coco_dict['images']:
    prediction = {
        'file_name': image['file_name'],
        'predictions': [],
    image_id_to_prediction[image['id']] = prediction
for ann in coco_dict['annotations']:
    pred = {
        'category_id': ann['category_id'],
        'bbox': ann['bbox'],
        'score': ann.get('score', 0)

for prediction in image_id_to_prediction.values():
    filename_prediction = os.path.splitext(prediction['file_name'])[0] + '.json'
    path_to_prediction = os.path.join(path_predictions_task, filename_prediction)
    with open(path_to_prediction, 'w') as f:
        json.dump(prediction, f)

Creating Prediction Files for Videos

Lightly expects one prediction file per frame in a video. Predictions can be created following the Python example code below. Make sure that PyAV is installed on your system for it to work correctly.

import av
import json
from pathlib import Path
from typing import List, Dict

dataset_dir = Path('/datasets/my_dataset')
predictions_dir = dataset_dir / '.lightly' / 'predictions' / 'my_prediction_task'

def model_predict(frame) -> List[Dict]:
    # This function must be overwritten to generate predictions for a frame using
    # a prediction model of your choice. Here we just return an example prediction.
    # See
    # for possible prediction formats.
    return [{'category_id': 0, 'bbox': [0, 10, 100, 30], 'score': 0.8}]

for video_path in dataset_dir.glob('**/*.mp4'):
    # get predictions for frames
    predictions = []
    with as container:
        stream =[0]
        for frame in container.decode(stream):

    # save predictions
    num_frames = len(predictions)
    zero_padding = len(str(num_frames))
    for frame_index, frame_predictions in enumerate(predictions):
        video_name = video_path.relative_to(dataset_dir).with_suffix('')
        frame_name = Path(f'{video_name}-{frame_index:0{zero_padding}}-{video_path.suffix[1:]}.png')
        prediction = {
            'file_name': str(frame_name),
            'predictions': frame_predictions,
        out_path = predictions_dir / frame_name.with_suffix('.json')
        out_path.parent.mkdir(parents=True, exist_ok=True)
        with open(out_path, 'w') as file:
            json.dump(prediction, file)

# example directory structure before
# .
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4
# example directory structure after
# .
# ├── .lightly
# │   └── predictions
# │       └── my_prediction_task
# │           ├── test
# │           │   ├── video_0-000-mp4.json
# │           │   ├── video_0-001-mp4.json
# │           │   ├── video_0-002-mp4.json
# │           │   └── ...
# │           └── train
# │               ├── video_1-000-mp4.json
# │               ├── video_1-001-mp4.json
# │               ├── video_1-002-mp4.json
# |               ├── ...
# |               ├── video_2-000-mp4.json
# |               ├── video_2-001-mp4.json
# |               ├── video_2-002-mp4.json
# │               └── ...
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4


It is discouraged to use another library than PyAV for loading videos with Python as the order and number of loaded frames might differ.

Extracting Frames with FFMPEG

Alternatively, frames can also first be extracted as images with ffmpeg and then further processed by any prediction model that supports images. The example command below shows how to extract frames and save them with the filename expected by Lightly. Make sure that ffmpeg is installed on your system before running the command.

VIDEO=video.mp4; NUM_FRAMES=$(ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -of csv=p=0 ${VIDEO}); ffmpeg -r 1 -i ${VIDEO} -start_number 0 ${VIDEO%.mp4}-%0${#NUM_FRAMES}d-mp4.png

# results in the following file structure
├── video.mp4
├── video-000-mp4.png
├── video-001-mp4.png
├── video-002-mp4.png
├── video-003-mp4.png
└── ...