Prediction Format

LightlyOne can use images you provided in a datasource together with predictions of a machine learning model. They are used to improve your selection results, either with an active learning or a balancing strategy. Object or keypoint detection predictions can also be used to run LightlyOne with object diversity.

By providing predictions in the datasource, you have complete control over them. If you add new samples to your datasource, you can simultaneously add their predictions to the datasource. If you already have labels instead of predictions, you can treat them just as predictions and upload them the same way.

Predictions Folder Structure

In the following, we will outline the format of the predictions required by the LightlyOne Worker. Everything regarding predictions will occur in a subfolder of your configured Lightly datasource called .lightly/predictions. The general structure of your input and Lightly datasource will look like this:

input_datasource/
├── image_0.png
└── subdir/
    ├── image_1.png
    ├── image_2.png
    ├── ...
    └── image_N.png

lightly_datasource/
└── .lightly/predictions/
    ├── tasks.json
    ├── task_1/
    │   ├── schema.json
    │   ├── image_0.json
    │   └── subdir/
    │       ├── image_1.json
    │       ├── image_2.json
    │       ├── ...
    │       └── image_N.json
    └── task_2/
        ├── schema.json
        ├── image_0.json
        └── subdir/
            ├── image_1.json
            ├── image_2.json
            ├── ...
            └── image_N.json

It is important that the relative filename path of the images to the input_datasource is exactly the same as the relative filename path of the predictions to the .lightly/predictions/task_i directory. This way, LightlyOne can match the predictions to the correct source images or videos. As an example, the prediction file for input_datasource / subdir/image_1.png should be at lightly_datasource/.lightly/predictions/task_1 / subdir/image_1.json. Note that the subdir part of the path is included in both cases.

Each subfolder in .lightly/predictions corresponds to one prediction task (e.g., a classification task and an object detection task). All of the files are explained in the following sections.

🚧
Changing the Input datasource
Make sure to also use a new Lightly datasource whenever you use a new Input datasource. Otherwise the relative filename path changes and thus the predictions are not found anymore. For details, see Datasource Configuration for Multiple Projects. If you want to use one or multiple subdirectories, e.g. the subdir, as input, then keep the input_datasource unchanged and use the relevant filenames feature.

📘
Check out our reference project
If you want to have a look at a reference project with the proper folder structure to work with predictions and metadata we suggest having a look at our example here: https://github.com/lightly-ai/object_detection_example_structure

Prediction Tasks

To let LightlyOne know what kind of prediction tasks you want to work with, LightlyOne needs to know their names. A task name is the name of the corresponding subfolder in .lightly/predictions. You can make them available for LightlyOne by adding the list of task names to a tasks.json file in the .lightly/predictions directory.

For example, let’s say we are working with the following folder structure:

lightly_datasource/
└── .lightly/predictions/
    ├── tasks.json
    ├── classification_weather/
    │   ├── schema.json
    │   └── ...
    ├── object_detection_people/
    │   ├── schema.json
    │   └── ...
    ├── semantic_segmentation_cars/
    │   ├── schema.json
    │   └── ...
    └── some_directory_containing_irrelevant_things/
        └── ...

Then we can specify which subfolders contain relevant predictions in the tasks.json file:

[
    "classification_weather",
    "object_detection_people",
    "semantic_segmentation_cars"
]

🚧
Always add a tasks.json
Only the task names listed within tasks.json will be considered by the LightlyOne Worker! When adding a new subfolder with predictions, always remember to add the subfolder name to the tasks.json file.

🚧
Don't forget the schema.json
If you list a folder in tasks.json that doesn’t contain a valid schema.json file, the LightlyOne Worker will report an error! See below how to create a suitable schema.json file.

Prediction Schema

Every prediction task needs a schema defining the format of the predictions. The LightlyOne Platform uses the schema to identify and display prediction classes correctly. It also helps to prevent errors as all loaded predictions are validated against this schema.

You can provide this information to LightlyOne by adding a schema.json file to the folder of a prediction task. The schema.json file must have a key categories with a list of categories following the COCO annotation format. It must also have a key task_type indicating the type of the predictions. The task_type must be one of the following:

classification
object-detection
keypoint-detection
instance-segmentation
semantic-segmentation

For example, let’s say we are working with a classification model predicting the weather on an image. The three classes are sunny, clouded, and rainy. Then the schema.json file should look as follows:

{
    "task_type": "classification",
    "categories": [
        {
            "id": 0,
            "name": "sunny"
        },
        {
            "id": 1,
            "name": "clouded"
        },
        {
            "id": 2,
            "name": "rainy"
        }
    ]
}

Prediction Files

LightlyOne requires a single prediction file per image. Predictions are saved as JSON files following the Prediction Format. They are stored in the subfolder .lightly/predictions/${TASK_NAME} in the LightlyOne datasource the dataset was configured with. To make sure LightlyOne can match the predictions to the correct source images or videos, it’s necessary to follow the naming convention:

# Filename of the prediction for image in input_datasource/FILENAME.EXT
lightly_datasource/.lightly/predictions/${TASK_NAME}/${FILENAME}.json

# Example
# Image: input_datasource/subdir/image_1.png
# Task name: my_classification_task
# Prediction file must be at:
lightly_datasource/.lightly/predictions/my_classification_task/subdir/image_1.json

# Example
# Image: input_datasource/image_0.png
# Task name: my_classification_task
# Prediction file must be at:
lightly_datasource/.lightly/predictions/my_classification_task/image_0.json

See Create Prediction Files from COCO on how to convert coco predictions into the format required by LightlyOne automatically.

Prediction Files for Videos

When working with videos, LightlyOne requires a prediction file per frame. The prediction file name must contain the original video name, video extension, and frame number in the following format:

{VIDEO_NAME}-{FRAME_NUMBER}-{VIDEO_EXTENSION}.json

Frame numbers start from 0. They are zero-padded to the total length of the number of frames in a video. A video with 200 frames must have the frame number padded to length three. For example, the frame number for frame 99 becomes 099. A video with 1000 frames must have frame numbers padded to length four (99 becomes 0099). You can use this python code snippet to set the filename:

zero_padding = len(str(len(video_frames)))
prediction_filenames[frame_index] = f"{video_name}-{frame_index:0{zero_padding}}-{video_extension}.json"

Examples are shown below:

# Filename of the predictions of the Xth frame of video input_datasource/FILENAME.EXT
# with 200 frames (padding: len(str(200)) = 3)
lightly_datasource/.lightly/predictions/${TASK_NAME}/${FILENAME}-${X:03d}-${EXT}.json

# Example
# Video: input_datasource/subdir/video_1.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
lightly_datasource/.lightly/predictions/my_classification_task/subdir/video_1-099-mp4.json

# Example
# Video: input_datasource/video_0.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
lightly_datasource/.lightly/predictions/my_classification_task/video_0-099-mp4.json

See also the full python code example to Create Prediction Files for Videos.

Prediction Format

Predictions for an image must have a list of predictions. Here, predictions is a list of Prediction Singletons. Each entry in the predictions list contains the information for a single prediction. This is typically an image classification, a detected object, or a segmentation mask.

Predictions have the following entries:

category_id is the id of the predicted class
score is the final prediction score/confidence, values must be in [0, 1]
probabilities are the per-class probabilities of the prediction, values must be in [0, 1]
and sum up to 1.0

Depending on the prediction task, additional entries might be required. For details, see Prediction Singletons.
You can also use Lightly's labelformat which is an Open Source package that can convert predictions of several standard formats to the LightlyOne format.

Example Classification

{
    "predictions": [ // classes: [sunny, clouded, rainy]
        {
            "category_id": 0,                // category in [0, num categories - 1]
            "probabilities": [0.8, 0.1, 0.1] // values in [0, 1], sum up to 1.0
        } 
    ] // list corresponding to the field "predictions" must have exactly one element for Classification
}

Example Object Detection

{
    "predictions": [ // classes: [person, car]
        {
            "category_id": 0,           // category in [0, num categories - 1]
            "bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
                                        // x, y >= 0 and w, h >= 1
            "score": 0.8,               // prediction score in [0, 1]
            "probabilities": [0.2, 0.8] // optional, values in [0, 1], sum up to 1.0
        },
        {
            "category_id": 1,
            "bbox": [...],
            "score": 0.9,
            "probabilities": [0.9, 0.1]
        },
        {
            "category_id": 0,
            "bbox": [...],
            "score": 0.5,
            "probabilities": [0.6, 0.4]
        }
    ]
}

Example Keypoint Detection

{
    "predictions": [ // classes: [person, car]
        {
            "category_id": 0,           // category in [0, num categories - 1]
	          // keypoints in [x1, y1, s1, x2, y2, s3, ...] format
            // x, y are coordinates in pixels
            // s is a keypoint score in [0, 1]
            "keypoints": [100, 100, 0.95, 13, 29, 0.8, 30, 35, 0.5],
            "score": 0.8,               // prediction score in [0, 1]
            "bbox": [140, 100, 80, 90], // optional, x, y, w, h coordinates in pixels
                                        // x, y >= 0 and w, h >= 1
            "probabilities": [0.2, 0.8] // optional, values in [0, 1], sum up to 1.0
        },
        {
            "category_id": 1,
            "keypoints": [10, 20, 1, 30, 40, 1],
            "score": 0.9,
            "probabilities": [0.9, 0.1]
        },
    ]
}

Example Instance Segmentation

{
    "predictions": [ // classes: [person, car]
        {
            "category_id": 0,           // category in [0, num categories - 1]
            "segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
            "score": 0.8,               // prediction score in [0, 1]
            "bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
                                        // x, y >= 0 and w, h >= 1
            "probabilities": [0.8, 0.2] // optional, values in [0, 1], sum up to 1.0
        },
        {
            "category_id": 1,
            "segmentation": [...],
            "score": 0.9,
            "bbox": [...],
            "probabilities": [0.1, 0.9]
        },
    ]
}

Example Semantic Segmentation

{
    "predictions": [ // classes: [background, car, tree]
        {
            "category_id": 0,                  // category in [0, num categories - 1]
            "segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
            "score": 0.8,                      // prediction score in [0, 1]	
            "probabilities": [0.15, 0.8, 0.05] // optional, values in [0, 1], sum up to 1.0
        },
        {
            "category_id": 1,
            "segmentation": [...],
            "score": 0.9,
            "probabilities": [0.02, 0.08, 0.9]
        },
    ]
}

Prediction Singletons

The prediction singletons follow the COCO results format while dropping the image_id. Note that the category_id must be the same as the one defined in the schema and that the probabilities, if provided, must follow the order of the category ids.

Please use the following formats for each specific task type.

Classification

For classification, please use the following format:

[{
    "category_id"       : int,              // category id in [0, N]
    "probabilities"     : [p0, p1, ..., pN] // values in [0, 1], sum up to 1.0
}] // list must have exactly one dictionary for Classification

Object Detection

For detection with bounding boxes, please use the following format:

[{
    "category_id"   : int,               // category in [0, num categories - 1]
    "bbox"          : [x, y, w, h],      // coordinates in pixels from the top left image corner
                                         // x, y >= 0 and w, h >= 1
    "score"         : float,             // prediction score in [0, 1]
    "probabilities" : [p0, p1, ..., pN]  // optional, values in [0, 1], sum up to 1.0
}]

The bounding box format follows the COCO results documentation.

🚧
Boundig box format
Following COCO, LightlyOne uses [x, y, width, height] lists as bounding box format. Remember to convert your bounding boxes if you use a different format such as [x1, y1, x2, y2]!

📘
Objectness scores and class probabilities
Some frameworks only provide the score as the model output. The score is typically calculated during the Non-Maximum Suppression (NMS) by multiplying the objectness probability with the highest class probability.
Having not only a single score, but a separate objectness score and class probabilities can be valuable information for active learning. For example, an object detection model could have a score of 0.6, and the predicted class is a tree. However, we cannot know the prediction margin or entropy without class probabilities. With the class probabilities, we would also know whether the model thought it’s 0.5 tree, 0.4 person, and 0.1 car or 0.5 tree, 0.25 person, and 0.25 car.
Providing the probabilities makes the computation of active learnings scores and the class distribution more precise.
The active learnings scores are usually computed out of the probabilities vector. In case it is not defined, the probabilities vector is approximated: The class defined by the category_id is set to have the same probability, as the score. The other classes are assumed to have an equal probability such that that all probabilities sum up to 1.0. An example: The category_id is 0, the score is 0.7 and there are 4 classes defined in the schema.json -> The probabilities are approximated as [0.7, 0.1, 0.1, 0.1]
In the case that only 1 class is defined in the schema.json, it is assumed just for computing the active learning scores that a second class exists.
The class distribution is usually set to the probabilities vector. If the probabilities are missing, it is set to be 1.0 for the class defined by the category_id and 0.0 for the other classes.

Keypoint Detection

For keypoint detection, please use the following format:

[{
    "category_id"   : int,               // category in [0, num categories - 1]
    // keypoints in [x1, y1, s1, x2, y2, s3, ...] format
    // x, y are coordinates in pixels
    // s is a keypoint score in [0, 1]
    "keypoints"     : [x0, y0, s0, x1, y1, s1, ...]
    "score"         : float,             // prediction score in [0, 1]
    "bbox"          : [x, y, w, h],      // optional, coordinates in pixels from the top left image corner
                                         // x, y >= 0 and w, h >= 1
    "probabilities" : [p0, p1, ..., pN]  // optional, values in [0, 1], sum up to 1.0
}]

The keypoint detection format follows the COCO results documentation. The x and y coordinates represent pixels from the top left corner of the image.

Each keypoint prediction contains the keypoints, an optional bounding box, and optional class probabilities. If the bounding box is omitted, LightlyOne will infer it from the keypoints directly by drawing a tight bounding box around all keypoints (including non-visible ones).

📘
Multi-class Keypoint Detections
LightlyOne supports multi-class keypoint detections with a variable number of keypoints per class. For example, a keypoint prediction could consist of a detection for the class "Person" with 13 keypoints and a detection for a class "Car" with 10 keypoints. Each of the detections is then represented by one keypoint detection singleton.

Semantic Segmentation

For semantic segmentation, please use the following format:

[{
    "category_id"       : int,              // category in [0, num categories - 1]
    "segmentation"      : [int, int, ...],  // run length encoded binary segmentation mask
    "score"             : float,            // prediction score in [0, 1]
    "probabilities"     : [p0, p1, ..., pN] // optional, values in [0, 1], sum up to 1.0
}]

Each segmentation prediction contains the binary mask for one category and a corresponding score. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the segment belongs to that category.

To kickstart using LightlyOne with semantic segmentation predictions, we created an example script that takes model predictions and converts them to the correct format as follows. Here we provide examples for predictions in NumPy arrays and PyTorch Tensors.

Segmentations are defined with binary masks where each pixel is set to 0 or 1 if it belongs to the background or the object. The segmentation masks are compressed using run length encoding to reduce file size. Binary segmentation masks can be converted to the required format using the following function:

import numpy as np
from numpy.typing import NDArray
from typing import List


def encode(binary_mask: NDArray[np.int_]) -> List[int]:
    """Encodes a (H, W) binary segmentation mask with run length encoding.

    The run length encoding is an array with counts of subsequent 0s and 1s
    in the binary mask. The first value in the array is always the count of
    initial 0s.

    Examples:

        >>> binary_mask = np.array([
        >>>     [0, 0, 1, 1],
        >>>     [0, 1, 1, 1],
        >>>     [0, 0, 0, 1],
        >>> ])
        >>> encode(binary_mask)
        [2, 2, 1, 3, 3, 1]
    """
    assert np.all((binary_mask == 1) | (binary_mask == 0))
    flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
    borders = np.nonzero(np.diff(flat))[0]
    rle = np.diff(borders)
    if flat[1]:
        rle = np.concatenate(([0], rle))
    return rle.tolist()

import numpy as np
from typing import List
import torch


def encode(binary_mask_tensor: torch.Tensor) -> List[int]:
    """Encodes a (H, W) binary segmentation mask with run length encoding.

    The run length encoding is an array with counts of subsequent 0s and 1s
    in the binary mask. The first value in the array is always the count of
    initial 0s.
    
    Note that the shape of the input mask must be (H, W). Other libraries might
    give masks in a different shape.

    Examples:

        >>> binary_mask = torch.tensor([
        >>>     [0, 0, 1, 1],
        >>>     [0, 1, 1, 1],
        >>>     [0, 0, 0, 1],
        >>> ], dtype=torch.int)
        >>> encode(binary_mask)
        [2, 2, 1, 3, 3, 1]
    """
    binary_mask = binary_mask_tensor.detach().cpu().numpy()
    assert np.all((binary_mask == 1) | (binary_mask == 0))
    flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
    borders = np.nonzero(np.diff(flat))[0]
    rle = np.diff(borders)
    if flat[1]:
        rle = np.concatenate(([0], rle))
    return rle.tolist()

🚧
The shape of the input mask must be (H, W). Input masks acquired with some libraries, e.g., Tensorflow, might be in a different shape.

Segmentation models often output a probability for each pixel and category. Storing such probabilities can quickly result in large file sizes if the input images have a high resolution. LightlyOne expects only a single score or probability per segmentation to reduce storage requirements. If you have scores or probabilities for each pixel in the image, you must first aggregate them into a single score/probability. We recommend taking the median or mean score/probability over all pixels within the segmentation mask. The example below shows how pixel-wise segmentation predictions can be converted to the format required by Lightly.

import numpy as np
from numpy.typing import NDArray
from typing import List, Dict

PredictionType = Dict[str, Union[int, float, List[int]]]


def convert_to_lightly_predictions(model_predictions: NDArray[np.float_]) -> List[PredictionType]:
    """Converts model predictions to Lightly semantic segmentation predictions.

    Shape of `model_predictions`: (N, C, H, W)
        - N: number of images
        - C: category count
        - H: image height
        - W: image width

    Examples:
        >>> images = np.random.randn(3, 4, 5, 6)
        >>> convert_to_lightly_predictions(images)
        [{'category_id': 0, 'segmentation': [6, 1, 3, ..., 5], 'score': 0.95}, ...]

    Args:
        model_predictions:
            Predictions generated by a model for semantic segmentation.

    Returns:
        A list of Lightly semantic segmentation predictions.
    """
    lightly_predictions: List[PredictionType] = []

    for prediction in model_predictions:
        prediction_argmax = np.argmax(prediction, axis=0)
        for category_id in np.unique(prediction_argmax):
            binary_mask = prediction_argmax == category_id
            median_score = np.median(prediction[category_id, binary_mask])
            lightly_predictions.append(
                {
                    "category_id": int(category_id),
                    "segmentation": encode(binary_mask),
                    "score": float(median_score),
                }
            )

    return lightly_predictions

from typing import List, Dict, Union
import torch
import numpy as np

PredictionType = Dict[str, Union[int, float, List[int]]]


def convert_to_lightly_predictions(model_predictions: torch.Tensor) -> List[PredictionType]:
    """Converts model predictions to Lightly semantic segmentation predictions.

    Shape of `model_predictions`: (N, C, H, W)
        - N: number of images
        - C: category count
        - H: image height
        - W: image width

    Examples:
        >>> images = torch.randn(3, 4, 5, 6)
        >>> convert_to_lightly_predictions(images)
        [{'category_id': 0, 'segmentation': [6, 1, 3, ..., 5], 'score': 0.95}, ...]

    Args:
        model_predictions:
            Predictions generated by a model for semantic segmentation.

    Returns:
        A list of Lightly semantic segmentation predictions.
    """
    lightly_predictions: List[PredictionType] = []

    for prediction in model_predictions.detach().cpu().numpy():
        prediction_argmax = np.argmax(prediction, axis=0)
        for category_id in np.unique(prediction_argmax):
            binary_mask = prediction_argmax == category_id
            median_score = np.median(prediction[category_id, binary_mask])
            lightly_predictions.append(
                {
                    "category_id": int(category_id),
                    "segmentation": encode(binary_mask),
                    "score": float(median_score),
                }
            )

    return lightly_predictions

Instance Segmentation

For instance segmentation, please use the following format:

[{
    "category_id"   : int,               // category in [0, num categories - 1]
    "segmentation"  : [int, int, ...]    // run length encoded binary segmentation mask 
    "score"         : float,             // prediction score in [0, 1]
    "bbox"          : [x, y, w, h],      // coordinates in pixels from the top left image corner
                                         // x, y >= 0 and width, w, h >= 1
    "probabilities" : [p0, p1, ..., pN]  // optional, values in [0, 1], sum up to 1.0
}]

Each instance segmentation prediction contains the run length encoding (RLE) of the binary mask for the object instance, a bounding box that encloses the object instance, a score, and optional class probabilities. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the instance belongs to that category.

The bounding box format follows the COCO results documentation, as for object detection, while the segmentation mask follows the format described in semantic segmentation. The segmentation mask must span the full input image, i.e., the size of the decoded segmentation mask must match the size of the input image.

Create Object Detection Prediction Files from COCO

For creating the predictions folder .lightly/predictions, we recommend writing a script that takes your predictions and saves them in the format just outlined. You can either save the predictions on your local machine and then upload them to your datasource or save them directly to your datasource.

For example, the following script takes an object detection COCO predictions file. It needs the path to the predictions file and the LightlyOne datasource where the .lightly folder should be created as input. Don't forget to change these two parameters at the top of the script.

import json
import os
from pathlib import Path


### CHANGE THESE PARAMETERS
output_filepath = "/path/to/create/.lightly/dir"
annotation_filepath = "/path/to/_annotations.coco.json"

### Optionally change these parameters
task_name = "my_object_detection_task"
task_type = "object-detection"

# create prediction directory
path_predictions = os.path.join(output_filepath, ".lightly/predictions")
Path(path_predictions).mkdir(exist_ok=True, parents=True)

# Create task.json
path_task_json = os.path.join(path_predictions, "tasks.json")
tasks = [task_name]
with open(path_task_json, "w") as f:
    json.dump(tasks, f)

# read coco annotations
with open(annotation_filepath, "r") as f:
    coco_dict = json.load(f)

# Create schema.json for task
path_predictions_task = os.path.join(path_predictions, tasks[0])
Path(path_predictions_task).mkdir(exist_ok=True)
schema = {"task_type": task_type, "categories": coco_dict["categories"]}
path_schema_json = os.path.join(path_predictions_task, "schema.json")
with open(path_schema_json, "w") as f:
    json.dump(schema, f)

# Create predictions themselves
image_id_to_prediction = {}
image_id_to_filename = {}
for image in coco_dict["images"]:
    prediction = {"predictions": []}
    image_id_to_prediction[image["id"]] = prediction
    image_id_to_filename[image["id"]] = image["file_name"]

for ann in coco_dict["annotations"]:
    pred = {
        "category_id": ann["category_id"],
        "bbox": ann["bbox"],
        "score": ann.get("score", 0),
    }
    image_id_to_prediction[ann["image_id"]]["predictions"].append(pred)

for image_id, prediction in image_id_to_prediction.items():
    filename = image_id_to_filename[image_id]
    filename_prediction = os.path.splitext(filename)[0] + ".json"
    path_to_prediction = os.path.join(path_predictions_task, filename_prediction)
    with open(path_to_prediction, "w") as f:
        json.dump(prediction, f)

Create Prediction Files for Videos

LightlyOne expects one prediction file per frame in a video. Predictions can be created following the Python example code below. Make sure that PyAV is installed on your system for it to work correctly.

import av
import json
from pathlib import Path
from typing import List, Dict

dataset_dir = Path("/datasets/my_dataset")
predictions_dir = dataset_dir / ".lightly" / "predictions" / "my_prediction_task"


def model_predict(frame) -> List[Dict]:
    # This function must be overwritten to generate predictions for a frame using
    # a prediction model of your choice. Here we just return an example prediction.
    # See https://docs.lightly.ai/docker/advanced/datasource_predictions.html#prediction-format
    # for possible prediction formats.
    return [{"category_id": 0, "bbox": [0, 10, 100, 30], "score": 0.8}]


for video_path in dataset_dir.glob("**/*.mp4"):
    # get predictions for frames
    predictions = []
    with av.open(str(video_path)) as container:
        stream = container.streams.video[0]
        for frame in container.decode(stream):
            predictions.append(model_predict(frame.to_image()))

    # save predictions
    num_frames = len(predictions)
    zero_padding = len(str(num_frames))
    for frame_index, frame_predictions in enumerate(predictions):
        video_name = video_path.relative_to(dataset_dir).with_suffix("")
        frame_name = Path(
            f"{video_name}-{frame_index:0{zero_padding}}-{video_path.suffix[1:]}.png"
        )
        prediction = {
            "predictions": frame_predictions,
        }
        out_path = predictions_dir / frame_name.with_suffix(".json")
        out_path.parent.mkdir(parents=True, exist_ok=True)
        with open(out_path, "w") as file:
            json.dump(prediction, file)


# example directory structure before
# .
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4
#
# example directory structure after
# .
# ├── .lightly
# │   └── predictions
# │       └── my_prediction_task
# │           ├── test
# │           │   ├── video_0-000-mp4.json
# │           │   ├── video_0-001-mp4.json
# │           │   ├── video_0-002-mp4.json
# │           │   └── ...
# │           └── train
# │               ├── video_1-000-mp4.json
# │               ├── video_1-001-mp4.json
# │               ├── video_1-002-mp4.json
# |               ├── ...
# |               ├── video_2-000-mp4.json
# |               ├── video_2-001-mp4.json
# |               ├── video_2-002-mp4.json
# │               └── ...
# ├── test
# │   └── video_0.mp4
# └── train
#     ├── video_1.mp4
#     └── video_2.mp4

🚧
It is discouraged to use another library than PyAV for loading videos with Python as the order and number of loaded frames might differ.

Extract Frames with FFmpeg

Alternatively to creating predictions directly from video, frames can first be extracted as images with FFmpeg and then further processed by any prediction model supporting images. The example command below shows how to extract frames and save them with the filename expected by Lightly. Ensure that FFmpeg is installed on your system before running the command.

VIDEO=video.mp4; NUM_FRAMES=$(ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -of csv=p=0 ${VIDEO}); ffmpeg -r 1 -i ${VIDEO} -start_number 0 ${VIDEO%.mp4}-%0${#NUM_FRAMES}d-mp4.png

# results in the following file structure
.
├── video.mp4
├── video-000-mp4.png
├── video-001-mp4.png
├── video-002-mp4.png
├── video-003-mp4.png
└── ...