Prediction Format

Lightly can use images you provided in a datasource together with predictions of a machine learning model. They are used to improve your selection results, either with an active learning or a balancing strategy. Object or keypoint detection predictions can also be used to run Lightly with object diversity.

By providing predictions in the datasource, you have complete control over them. If you add new samples to your datasource, you can simultaneously add their predictions to the datasource. If you already have labels instead of predictions, you can treat them just as predictions and upload them the same way.

Predictions Folder Structure

In the following, we will outline the format of the predictions required by the Lightly Worker. Everything regarding predictions will occur in a subfolder of your configured Lightly datasource called .lightly/predictions. The general structure of your input and Lightly bucket will look like this:

s3://bucket/input/
├── image_0.png
└── subdir/
    ├── image_1.png
    ├── image_2.png
    ├── ...
    └── image_N.png

s3://bucket/lightly/
└── .lightly/predictions/
    ├── tasks.json
    ├── task_1/
    │   ├── schema.json
    │   ├── image_0.json
    │   └── subdir/
    │       ├── image_1.json
    │       ├── ...
    │       └── image_N.json
    └── task_2/
        ├── schema.json
        ├── image_0.json
        └── subdir/
            ├── image_1.json
            ├── ...
            └── image_N.json

Each subfolder in .lightly/predictions corresponds to one prediction task (e.g., a classification task and an object detection task). All of the files are explained in the following sections.

📘

Check out our reference project

If you want to have a look at a reference project with the proper folder structure to work with predictions and metadata we suggest having a look at our example here: https://github.com/lightly-ai/object_detection_example_structure

Prediction Tasks

To let Lightly know what kind of prediction tasks you want to work with, Lightly needs to know their names. A task name is the name of the corresponding subfolder in .lightly/predictions. You can make them available for Lightly by adding the list of task names to a tasks.json file in the .lightly/predictions directory.

For example, let’s say we are working with the following folder structure:

s3://bucket/lightly/
└── .lightly/predictions/
    ├── tasks.json
    ├── classification_weather/
    │   ├── schema.json
    │   └── ...
    ├── object_detection_people/
    │   ├── schema.json
    │   └── ...
    ├── semantic_segmentation_cars/
    │   ├── schema.json
    │   └── ...
    └── some_directory_containing_irrelevant_things/
        └── ...

Then we can specify which subfolders contain relevant predictions in the tasks.json file:

[
    "classification_weather",
    "object_detection_people",
    "semantic_segmentation_cars"
]

🚧

Always add a tasks.json

Only the task names listed within tasks.json will be considered by the Lightly Worker! When adding a new subfolder with predictions, always remember to add the subfolder name to the tasks.json file.

🚧

Don't forget the schema.json

If you list a folder in tasks.json that doesn’t contain a valid schema.json file, the Lightly Worker will report an error! See below how to create a suitable schema.json file.

Prediction Schema

Every prediction task needs a schema defining the format of the predictions. The Lightly Platform uses the schema to identify and display prediction classes correctly. It also helps to prevent errors as all loaded predictions are validated against this schema.

You can provide this information to Lightly by adding a schema.json file to the folder of a prediction task. The schema.json file must have a key categories with a list of categories following the COCO annotation format. It must also have a key task_type indicating the type of the predictions. The task_type must be one of the following:

  • classification
  • object-detection
  • keypoint-detection
  • semantic-segmentation

For example, let’s say we are working with a classification model predicting the weather on an image. The three classes are sunny, clouded, and rainy. Then the schema.json file should look as follows:

{
    "task_type": "classification",
    "categories": [
        {
            "id": 0,
            "name": "sunny"
        },
        {
            "id": 1,
            "name": "clouded"
        },
        {
            "id": 2,
            "name": "rainy"
        }
    ]
}

Prediction Files

Lightly requires a single prediction file per image. Predictions are saved as JSON files following the Prediction Format. They are stored in the subfolder .lightly/predictions/${TASK_NAME} in the Lightly bucket the dataset was configured with. To make sure Lightly can match the predictions to the correct source image, it’s necessary to follow the naming convention:

# Filename of the prediction for image in s3://bucket/input/FILENAME.EXT
s3://bucket/lightly/.lightly/predictions/${TASK_NAME}/${FILENAME}.json

# Example
# Image: s3://bucket/input/subdir/image_1.png
# Task name: my_classification_task
# Prediction file must be at:
s3://bucket/lightly/.lightly/predictions/my_classification_task/subdir/image_1.json

# Example
# Image: s3://bucket/input/image_0.png
# Task name: my_classification_task
# Prediction file must be at:
s3://bucket/lightly/.lightly/predictions/my_classification_task/image_0.json

See Create Prediction Files from COCO on how to convert coco predictions into the format required by Lightly automatically.

Prediction Files for Videos

When working with videos, Lightly requires a prediction file per frame. The prediction file name must contain the original video name, video extension, and frame number in the following format:

{VIDEO_NAME}-{FRAME_NUMBER}-{VIDEO_EXTENSION}.json

Frame numbers are zero-padded to the total length of the number of frames in a video. A video with 200 frames must have the frame number padded to length three. For example, the frame number for frame 99 becomes 099. A video with 1000 frames must have frame numbers padded to length four (99 becomes 0099).

Examples are shown below:

# Filename of the predictions of the Xth frame of video s3://bucket/input/FILENAME.EXT
# with 200 frames (padding: len(str(200)) = 3)
s3://bucket/lightly/.lightly/predictions/${TASK_NAME}/${FILENAME}-${X:03d}-${EXT}.json

# Example
# Video: s3://bucket/input/subdir/video_1.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/predictions/my_classification_task/subdir/video_1-099-mp4.json

# Example
# Video: s3://bucket/input/subdir/video_0.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/predictions/my_classification_task/video_0-099-mp4.json

See Create Prediction Files for Videos on how to extract video frames and create predictions using FFmpeg or Python.

🚧

Prediction Filename for Videos

When creating predictions of videos the Lightly Worker automatically extracts the frames as PNGs.
If you have a prediction for frame 123 of a video myVideo.mp4 the file_name within the myVideo-123-mp4.json file must have a .png ending.
The file_name would therefore be myVideo-123-mp4.png!

Prediction Format

Predictions for an image must have a file_name and a list of predictions. Here, file_name serves as a unique identifier to retrieve the image for which the predictions are made, while predictions is a list of Prediction Singletons. Each entry in the predictions list contains the information for a single prediction. This is typically an image classification, a detected object, or a segmentation mask.

Predictions have the following entries:

  • category_id is the id of the predicted class
  • score is the final prediction score/confidence
  • probabilities are the per-class probabilities of the prediction

Depending on the prediction task, additional entries might be required. For details, see Prediction Singletons.

Example Classification

{
    "file_name": "subdir/image_1.png",
    "predictions": [ // classes: [sunny, clouded, rainy]
        {
            "category_id": 0,
            "probabilities": [0.8, 0.1, 0.1]
        }
    ]
}

Example Object Detection

{
    "file_name": "subdir/image_1.png",
    "predictions": [ // classes: [person, car]
        {
            "category_id": 0,
            "bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
            "score": 0.8,
            "probabilities": [0.2, 0.8] // optional, sum up to 1.0
        },
        {
            "category_id": 1,
            "bbox": [...],
            "score": 0.9,
            "probabilities": [0.9, 0.1] // optional, sum up to 1.0
        },
        {
            "category_id": 0,
            "bbox": [...],
            "score": 0.5,
            "probabilities": [0.6, 0.4] // optional, sum up to 1.0
        }
    ]
}

Example Keypoint Detection

{
    "file_name": "subdir/image_1.png",
    "predictions": [ // classes: [person, car]
        {
            "category_id": 0,
            "keypoints": [100, 100, 0.95], // keypoints in x, y, v format where s is the keypoint score
            "score": 0.8,
            "bbox": [140, 100, 80, 90], // optional, x, y, w, h coordinates in pixels
            "probabilities": [0.2, 0.8] // optional, sum up to 1.0
        },
        {
            "category_id": 1,
            "keypoints": [10, 20, 1, 30, 40, 1],
            "score": 0.9,
            "probabilities": [0.9, 0.1] // optional, sum up to 1.0
        },
    ]
}

Example Semantic Segmentation

{
    "file_name": "subdir/image_1.png",
    "predictions": [ // classes: [background, car, tree]
        {
            "category_id": 0,
            "segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
            "score": 0.8,
            "probabilities": [0.15, 0.8, 0.05] // optional, sum up to 1.0
        },
        {
            "category_id": 1,
            "segmentation": [...],
            "score": 0.9,
            "probabilities": [0.02, 0.08, 0.9] // optional, sum up to 1.0
        },
    ]
}

❗️

file_name should always be the relative path of the image to the root directory of your input bucket.

For example, if the input bucket has s3://bucket/input/ as the root directory and the image is saved at s3://bucket/input/subdir/image_1.png, then file_name should be subdir/image_1.png.

Prediction Singletons

The prediction singletons follow the COCO results format while dropping the image_id. Note that the category_id must be the same as the one defined in the schema and that the probabilities, if provided, must follow the order of the category ids.

Please use the following formats for each specific task type.

Classification

For classification, please use the following format:

[{
    "category_id"       : int,
    "probabilities"     : [p0, p1, ..., pN]    // optional, sum up to 1.0
}]

Object Detection

For detection with bounding boxes, please use the following format:

[{
    "category_id"       : int,
    "bbox"              : [x, y, width, height], // coordinates in pixels from the top left image corner
    "score"             : float,
    "probabilities"     : [p0, p1, ..., pN]     // optional, sum up to 1.0
}]

The bounding box format follows the COCO results documentation.

📘

Objectness scores and class probabilities

Some frameworks only provide the score as the model output. The score is typically calculated during the Non-Maximum Suppression (NMS) by multiplying the objectness probability with the highest class probability.

Having not only a single score, but a separate objectness score and class probabilities can be valuable information for active learning. For example, an object detection model could have a score of 0.6, and the predicted class is a tree. However, we cannot know the prediction margin or entropy without class probabilities. With the class probabilities, we would also know whether the model thought it’s 0.5 tree, 0.4 person, and 0.1 car or 0.5 tree, 0.25 person, and 0.25 car.
Providing the probabilities makes the computation of active learnings scores and the class distribution more precise.

The active learnings scores are usually computed out of the probabilities vector. In case it is not defined, the probabilities vector is approximated: The class defined by the category_id is set to have the same probability, as the score. The other classes are assumed to have an equal probability such that that all probabilities sum up to 1.0. An example: The category_id is 0, the score is 0.7 and there are 4 classes defined in the schema.json -> The probabilities are approximated as [0.7, 0.1, 0.1, 0.1]
In the case that only 1 class is defined in the schema.json, it is assumed just for computing the active learning scores that a second class exists.

The class distribution is usually set to the probabilities vector. If the probabilities are missing, it is set to be 1.0 for the class defined by the category_id and 0.0 for the other classes.

Keypoint Detection

For keypoint detection, please use the following format:

[{
    "category_id"       : int,
    "keypoints":        : [x0, y0, s0, x1, y1, s1, ...] // keypoints in x, y, s format where s is the keypoint score 
    "score"             : float,
    "bbox"              : [x, y, width, height], // optional, coordinates in pixels from the top left image corner
    "probabilities"     : [p0, p1, ..., pN]      // optional, sum up to 1.0
}]

The keypoint detection format follows the COCO results documentation. The x and y coordinates represent pixels from the top left corner of the image.

Each keypoint prediction contains the keypoints, an optional bounding box, and optional class probabilities. If the bounding box is omitted, Lightly will infer it from the keypoints directly by drawing a tight bounding box around all keypoints (including non-visible ones).

📘

Multi-class Keypoint Detections

Lightly supports multi-class keypoint detections with a variable number of keypoints per class. For example, a keypoint prediction could consist of a detection for the class "Person" with 13 keypoints and a detection for a class "Car" with 10 keypoints. Each of the detections is then represented by one keypoint detection singleton.

Semantic Segmentation

For semantic segmentation, please use the following format:

[{
    "category_id"       : int,
    "segmentation"      : [int, int, ...],  // run length encoded binary segmentation mask
    "score"             : float,
    "probabilities"     : [p0, p1, ..., pN] // optional, sum up to 1.0
}]

Each segmentation prediction contains the binary mask for one category and a corresponding score. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the segment belongs to that category.

To kickstart using Lightly with semantic segmentation predictions, we created an example script that takes model predictions and converts them to the correct format: Download Here

Segmentations are defined with binary masks where each pixel is set to 0 or 1 if it belongs to the background or the object. The segmentation masks are compressed using run length encoding to reduce file size. Binary segmentation masks can be converted to the required format using the following function:

import numpy as np


def encode(binary_mask):
    """Encodes a (H, W) binary segmentation mask with run length encoding.

    The run length encoding is an array with counts of subsequent 0s and 1s
    in the binary mask. The first value in the array is always the count of
    initial 0s.

    Examples:

        >>> binary_mask = [
        >>>     [0, 0, 1, 1],
        >>>     [0, 1, 1, 1],
        >>>     [0, 0, 0, 1],
        >>> ]
        >>> encode(binary_mask)
        [2, 2, 1, 3, 3, 1]
    """
    flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
    borders = np.nonzero(np.diff(flat))[0]
    rle = np.diff(borders)
    if flat[1]:
        rle = np.concatenate(([0], rle))
    return rle.tolist()

Segmentation models often output a probability for each pixel and category. Storing such probabilities can quickly result in large file sizes if the input images have a high resolution. Lightly expects only a single score or probability per segmentation to reduce storage requirements. If you have scores or probabilities for each pixel in the image, you must first aggregate them into a single score/probability. We recommend taking the median or mean score/probability over all pixels within the segmentation mask. The example below shows how pixel-wise segmentation predictions can be converted to the format required by Lightly.

# Make prediction for a single image. The output is assumed to be a tensor
# with shape (categories, height, width).
segmentation = model(image)

# Most probable object category per pixel.
category = segmentation.argmax(dim=0)

# Convert to lightly predictions.
predictions = []
for category_id in category.unique():
    binary_mask = category == category_id
    median_score = segmentation[category_id, binary_mask].median()
    predictions.append(
        {
            "category_id": int(category_id),
            "segmentation": encode(binary_mask),
            "score": float(median_score),
        }
    )

prediction = {
    "file_name": "subdir/image_name.png",
    "predictions": predictions,
}

📘

Support for keypoint detection is coming soon!

Create Prediction Files from COCO

For creating the predictions folder .lightly/predictions, we recommend writing a script that takes your predictions and saves them in the format just outlined. You can either save the predictions on your local machine and then upload them to your datasource or save them directly to your datasource.

For example, the following script takes an object detection COCO predictions file. It needs the path to the predictions file and the Lightly datasource where the .lightly folder should be created as input. Don't forget to change these two parameters at the top of the script.

import json
import os
from pathlib import Path


### CHANGE THESE PARAMETERS
output_filepath = "/path/to/create/.lightly/dir"
annotation_filepath = "/path/to/_annotations.coco.json"

### Optionally change these parameters
task_name = "my_object_detection_task"
task_type = "object-detection"

# create prediction directory
path_predictions = os.path.join(output_filepath, ".lightly/predictions")
Path(path_predictions).mkdir(exist_ok=True, parents=True)

# Create task.json
path_task_json = os.path.join(path_predictions, "tasks.json")
tasks = [task_name]
with open(path_task_json, "w") as f:
    json.dump(tasks, f)

# read coco annotations
with open(annotation_filepath, "r") as f:
    coco_dict = json.load(f)

# Create schema.json for task
path_predictions_task = os.path.join(path_predictions, tasks[0])
Path(path_predictions_task).mkdir(exist_ok=True)
schema = {"task_type": task_type, "categories": coco_dict["categories"]}
path_schema_json = os.path.join(path_predictions_task, "schema.json")
with open(path_schema_json, "w") as f:
    json.dump(schema, f)

# Create predictions themselves
image_id_to_prediction = dict()
for image in coco_dict["images"]:
    prediction = {
        "file_name": image["file_name"],
        "predictions": [],
    }
    image_id_to_prediction[image["id"]] = prediction
for ann in coco_dict["annotations"]:
    pred = {
        "category_id": ann["category_id"],
        "bbox": ann["bbox"],
        "score": ann.get("score", 0),
    }
    image_id_to_prediction[ann["image_id"]]["predictions"].append(pred)

for prediction in image_id_to_prediction.values():
    filename_prediction = os.path.splitext(prediction["file_name"])[0] + ".json"
    path_to_prediction = os.path.join(path_predictions_task, filename_prediction)
    with open(path_to_prediction, "w") as f:
        json.dump(prediction, f)

Create Prediction Files for Videos

Lightly expects one prediction file per frame in a video. Predictions can be created following the Python example code below. Make sure that PyAV is installed on your system for it to work correctly.

import av
import json
from pathlib import Path
from typing import List, Dict

dataset_dir = Path("/datasets/my_dataset")
predictions_dir = dataset_dir / ".lightly" / "predictions" / "my_prediction_task"


def model_predict(frame) -> List[Dict]:
    # This function must be overwritten to generate predictions for a frame using
    # a prediction model of your choice. Here we just return an example prediction.
    # See https://docs.lightly.ai/docker/advanced/datasource_predictions.html#prediction-format
    # for possible prediction formats.
    return [{"category_id": 0, "bbox": [0, 10, 100, 30], "score": 0.8}]


for video_path in dataset_dir.glob("**/*.mp4"):
    # get predictions for frames
    predictions = []
    with av.open(str(video_path)) as container:
        stream = container.streams.video[0]
        for frame in container.decode(stream):
            predictions.append(model_predict(frame.to_image()))

    # save predictions
    num_frames = len(predictions)
    zero_padding = len(str(num_frames))
    for frame_index, frame_predictions in enumerate(predictions):
        video_name = video_path.relative_to(dataset_dir).with_suffix("")
        frame_name = Path(
            f"{video_name}-{frame_index:0{zero_padding}}-{video_path.suffix[1:]}.png"
        )
        prediction = {
            "file_name": str(frame_name),
            "predictions": frame_predictions,
        }
        out_path = predictions_dir / frame_name.with_suffix(".json")
        out_path.parent.mkdir(parents=True, exist_ok=True)
        with open(out_path, "w") as file:
            json.dump(prediction, file)


# example directory structure before
# .
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4
#
# example directory structure after
# .
# ├── .lightly
# │   └── predictions
# │       └── my_prediction_task
# │           ├── test
# │           │   ├── video_0-000-mp4.json
# │           │   ├── video_0-001-mp4.json
# │           │   ├── video_0-002-mp4.json
# │           │   └── ...
# │           └── train
# │               ├── video_1-000-mp4.json
# │               ├── video_1-001-mp4.json
# │               ├── video_1-002-mp4.json
# |               ├── ...
# |               ├── video_2-000-mp4.json
# |               ├── video_2-001-mp4.json
# |               ├── video_2-002-mp4.json
# │               └── ...
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4

🚧

It is discouraged to use another library than PyAV for loading videos with Python as the order and number of loaded frames might differ.

Extract Frames with FFmpeg

Alternatively to creating predictions directly from video, frames can first be extracted as images with FFmpeg and then further processed by any prediction model supporting images. The example command below shows how to extract frames and save them with the filename expected by Lightly. Ensure that FFmpeg is installed on your system before running the command.

VIDEO=video.mp4; NUM_FRAMES=$(ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -of csv=p=0 ${VIDEO}); ffmpeg -r 1 -i ${VIDEO} -start_number 0 ${VIDEO%.mp4}-%0${#NUM_FRAMES}d-mp4.png

# results in the following file structure
.
├── video.mp4
├── video-000-mp4.png
├── video-001-mp4.png
├── video-002-mp4.png
├── video-003-mp4.png
└── ...