Prediction Format
LightlyOne can use images you provided in a datasource together with predictions of a machine learning model. They are used to improve your selection results, either with an active learning or a balancing strategy. Object or keypoint detection predictions can also be used to run LightlyOne with object diversity.
By providing predictions in the datasource, you have complete control over them. If you add new samples to your datasource, you can simultaneously add their predictions to the datasource. If you already have labels instead of predictions, you can treat them just as predictions and upload them the same way.
Predictions Folder Structure
In the following, we will outline the format of the predictions required by the LightlyOne Worker. Everything regarding predictions will occur in a subfolder of your configured Lightly datasource called .lightly/predictions
. The general structure of your input and Lightly datasource will look like this:
input_datasource/
├── image_0.png
└── subdir/
├── image_1.png
├── image_2.png
├── ...
└── image_N.png
lightly_datasource/
└── .lightly/predictions/
├── tasks.json
├── task_1/
│ ├── schema.json
│ ├── image_0.json
│ └── subdir/
│ ├── image_1.json
│ ├── image_2.json
│ ├── ...
│ └── image_N.json
└── task_2/
├── schema.json
├── image_0.json
└── subdir/
├── image_1.json
├── image_2.json
├── ...
└── image_N.json
If you want to use only one or multiple subdirectories, e.g. the subdir
, as input, then keep the input_datasource
unchanged and use the relevant filenames feature.
Each subfolder in .lightly/predictions
corresponds to one prediction task (e.g., a classification task and an object detection task). All of the files are explained in the following sections.
Check out our reference project
If you want to have a look at a reference project with the proper folder structure to work with predictions and metadata we suggest having a look at our example here: https://github.com/lightly-ai/object_detection_example_structure
Prediction Tasks
To let LightlyOne know what kind of prediction tasks you want to work with, LightlyOne needs to know their names. A task name is the name of the corresponding subfolder in .lightly/predictions
. You can make them available for LightlyOne by adding the list of task names to a tasks.json
file in the .lightly/predictions
directory.
For example, let’s say we are working with the following folder structure:
s3://bucket/lightly/
└── .lightly/predictions/
├── tasks.json
├── classification_weather/
│ ├── schema.json
│ └── ...
├── object_detection_people/
│ ├── schema.json
│ └── ...
├── semantic_segmentation_cars/
│ ├── schema.json
│ └── ...
└── some_directory_containing_irrelevant_things/
└── ...
Then we can specify which subfolders contain relevant predictions in the tasks.json
file:
[
"classification_weather",
"object_detection_people",
"semantic_segmentation_cars"
]
Always add a tasks.json
Only the task names listed within
tasks.json
will be considered by the LightlyOne Worker! When adding a new subfolder with predictions, always remember to add the subfolder name to thetasks.json
file.
Don't forget the schema.json
If you list a folder in
tasks.json
that doesn’t contain a validschema.json
file, the LightlyOne Worker will report an error! See below how to create a suitableschema.json
file.
Prediction Schema
Every prediction task needs a schema defining the format of the predictions. The LightlyOne Platform uses the schema to identify and display prediction classes correctly. It also helps to prevent errors as all loaded predictions are validated against this schema.
You can provide this information to LightlyOne by adding a schema.json
file to the folder of a prediction task. The schema.json
file must have a key categories
with a list of categories following the COCO annotation format. It must also have a key task_type
indicating the type of the predictions. The task_type
must be one of the following:
- classification
- object-detection
- keypoint-detection
- instance-segmentation
- semantic-segmentation
For example, let’s say we are working with a classification model predicting the weather on an image. The three classes are sunny, clouded, and rainy. Then the schema.json
file should look as follows:
{
"task_type": "classification",
"categories": [
{
"id": 0,
"name": "sunny"
},
{
"id": 1,
"name": "clouded"
},
{
"id": 2,
"name": "rainy"
}
]
}
Prediction Files
LightlyOne requires a single prediction file per image. Predictions are saved as JSON files following the Prediction Format. They are stored in the subfolder .lightly/predictions/${TASK_NAME}
in the LightlyOne datasource the dataset was configured with. To make sure LightlyOne can match the predictions to the correct source image, it’s necessary to follow the naming convention:
# Filename of the prediction for image in s3://bucket/input/FILENAME.EXT
s3://bucket/lightly/.lightly/predictions/${TASK_NAME}/${FILENAME}.json
# Example
# Image: s3://bucket/input/subdir/image_1.png
# Task name: my_classification_task
# Prediction file must be at:
s3://bucket/lightly/.lightly/predictions/my_classification_task/subdir/image_1.json
# Example
# Image: s3://bucket/input/image_0.png
# Task name: my_classification_task
# Prediction file must be at:
s3://bucket/lightly/.lightly/predictions/my_classification_task/image_0.json
See Create Prediction Files from COCO on how to convert coco predictions into the format required by LightlyOne automatically.
Prediction Files for Videos
When working with videos, LightlyOne requires a prediction file per frame. The prediction file name must contain the original video name, video extension, and frame number in the following format:
{VIDEO_NAME}-{FRAME_NUMBER}-{VIDEO_EXTENSION}.json
Frame numbers start from 0. They are zero-padded to the total length of the number of frames in a video. A video with 200 frames must have the frame number padded to length three. For example, the frame number for frame 99 becomes 099. A video with 1000 frames must have frame numbers padded to length four (99 becomes 0099).
Examples are shown below:
# Filename of the predictions of the Xth frame of video s3://bucket/input/FILENAME.EXT
# with 200 frames (padding: len(str(200)) = 3)
s3://bucket/lightly/.lightly/predictions/${TASK_NAME}/${FILENAME}-${X:03d}-${EXT}.json
# Example
# Video: s3://bucket/input/subdir/video_1.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/predictions/my_classification_task/subdir/video_1-099-mp4.json
# Example
# Video: s3://bucket/input/video_0.mp4
# Task name: my_classification_task
# Prediction file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/predictions/my_classification_task/video_0-099-mp4.json
See Create Prediction Files for Videos on how to extract video frames and create predictions using FFmpeg or Python.
Prediction Filename for Videos
When creating predictions of videos the LightlyOne Worker automatically extracts the frames as PNGs.
If you have a prediction for frame123
of a videomyVideo.mp4
thefile_name
within themyVideo-123-mp4.json
file must have a.png
ending. Thefile_name
would therefore bemyVideo-123-mp4.png
!
Prediction Format
Predictions for an image must have a file_name
and a list of predictions
. Here, file_name
serves as a unique identifier to retrieve the image for which the predictions are made, while predictions
is a list of Prediction Singletons. Each entry in the predictions list contains the information for a single prediction. This is typically an image classification, a detected object, or a segmentation mask.
Predictions have the following entries:
category_id
is the id of the predicted classscore
is the final prediction score/confidence, values must be in [0, 1]probabilities
are the per-class probabilities of the prediction, values must be in [0, 1]
and sum up to 1.0
Depending on the prediction task, additional entries might be required. For details, see Prediction Singletons.
You can also use Lightly's labelformat which is an Open Source package that can convert predictions of several standard formats to the LightlyOne format.
Example Classification
{
"file_name": "subdir/image_1.png",
"predictions": [ // classes: [sunny, clouded, rainy]
{
"category_id": 0, // category in [0, num categories - 1]
"probabilities": [0.8, 0.1, 0.1] // values in [0, 1], sum up to 1.0
}
] // list corresponding to the field "predictions" must have exactly one element for Classification
}
Example Object Detection
{
"file_name": "subdir/image_1.png",
"predictions": [ // classes: [person, car]
{
"category_id": 0, // category in [0, num categories - 1]
"bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
// x, y >= 0 and w, h >= 1
"score": 0.8, // prediction score in [0, 1]
"probabilities": [0.2, 0.8] // optional, values in [0, 1], sum up to 1.0
},
{
"category_id": 1,
"bbox": [...],
"score": 0.9,
"probabilities": [0.9, 0.1]
},
{
"category_id": 0,
"bbox": [...],
"score": 0.5,
"probabilities": [0.6, 0.4]
}
]
}
Example Keypoint Detection
{
"file_name": "subdir/image_1.png",
"predictions": [ // classes: [person, car]
{
"category_id": 0, // category in [0, num categories - 1]
// keypoints in [x1, y1, s1, x2, y2, s3, ...] format
// x, y are coordinates in pixels
// s is a keypoint score in [0, 1]
"keypoints": [100, 100, 0.95, 13, 29, 0.8, 30, 35, 0.5],
"score": 0.8, // prediction score in [0, 1]
"bbox": [140, 100, 80, 90], // optional, x, y, w, h coordinates in pixels
// x, y >= 0 and w, h >= 1
"probabilities": [0.2, 0.8] // optional, values in [0, 1], sum up to 1.0
},
{
"category_id": 1,
"keypoints": [10, 20, 1, 30, 40, 1],
"score": 0.9,
"probabilities": [0.9, 0.1]
},
]
}
Example Instance Segmentation
{
"file_name": "subdir/image_1.png",
"predictions": [ // classes: [person, car]
{
"category_id": 0, // category in [0, num categories - 1]
"segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
"score": 0.8, // prediction score in [0, 1]
"bbox": [140, 100, 80, 90], // x, y, w, h coordinates in pixels
// x, y >= 0 and w, h >= 1
"probabilities": [0.8, 0.2] // optional, values in [0, 1], sum up to 1.0
},
{
"category_id": 1,
"segmentation": [...],
"score": 0.9,
"bbox": [...],
"probabilities": [0.1, 0.9]
},
]
}
Example Semantic Segmentation
{
"file_name": "subdir/image_1.png",
"predictions": [ // classes: [background, car, tree]
{
"category_id": 0, // category in [0, num categories - 1]
"segmentation": [100, 80, 90, 85, ...], //run length encoded binary segmentation mask
"score": 0.8, // prediction score in [0, 1]
"probabilities": [0.15, 0.8, 0.05] // optional, values in [0, 1], sum up to 1.0
},
{
"category_id": 1,
"segmentation": [...],
"score": 0.9,
"probabilities": [0.02, 0.08, 0.9]
},
]
}
file_name
should always be the relative path of the image to the root directory of your input datasource.For example, if the input datasource has
s3://bucket/input/
as the root directory and the image is saved ats3://bucket/input/subdir/image_1.png
, thenfile_name
should besubdir/image_1.png
.
Prediction Singletons
The prediction singletons follow the COCO results format while dropping the image_id
. Note that the category_id
must be the same as the one defined in the schema and that the probabilities
, if provided, must follow the order of the category ids.
Please use the following formats for each specific task type.
Classification
For classification, please use the following format:
[{
"category_id" : int, // category id in [0, N]
"probabilities" : [p0, p1, ..., pN] // values in [0, 1], sum up to 1.0
}] // list must have exactly one dictionary for Classification
Object Detection
For detection with bounding boxes, please use the following format:
[{
"category_id" : int, // category in [0, num categories - 1]
"bbox" : [x, y, w, h], // coordinates in pixels from the top left image corner
// x, y >= 0 and w, h >= 1
"score" : float, // prediction score in [0, 1]
"probabilities" : [p0, p1, ..., pN] // optional, values in [0, 1], sum up to 1.0
}]
The bounding box format follows the COCO results documentation.
Boundig box format
Following COCO, LightlyOne uses
[x, y, width, height]
lists as bounding box format. Remember to convert your bounding boxes if you use a different format such as[x1, y1, x2, y2]
!
Objectness scores and class probabilities
Some frameworks only provide the score as the model output. The score is typically calculated during the Non-Maximum Suppression (NMS) by multiplying the objectness probability with the highest class probability.
Having not only a single score, but a separate objectness score and class probabilities can be valuable information for active learning. For example, an object detection model could have a score of
0.6
, and the predicted class is a tree. However, we cannot know the prediction margin or entropy without class probabilities. With the class probabilities, we would also know whether the model thought it’s0.5
tree,0.4
person, and0.1
car or0.5
tree,0.25
person, and0.25
car.
Providing the probabilities makes the computation of active learnings scores and the class distribution more precise.The active learnings scores are usually computed out of the probabilities vector. In case it is not defined, the probabilities vector is approximated: The class defined by the category_id is set to have the same probability, as the score. The other classes are assumed to have an equal probability such that that all probabilities sum up to
1.0
. An example: The category_id is 0, the score is0.7
and there are 4 classes defined in the schema.json -> The probabilities are approximated as[0.7, 0.1, 0.1, 0.1]
In the case that only 1 class is defined in the schema.json, it is assumed just for computing the active learning scores that a second class exists.The class distribution is usually set to the probabilities vector. If the probabilities are missing, it is set to be
1.0
for the class defined by the category_id and0.0
for the other classes.
Keypoint Detection
For keypoint detection, please use the following format:
[{
"category_id" : int, // category in [0, num categories - 1]
// keypoints in [x1, y1, s1, x2, y2, s3, ...] format
// x, y are coordinates in pixels
// s is a keypoint score in [0, 1]
"keypoints" : [x0, y0, s0, x1, y1, s1, ...]
"score" : float, // prediction score in [0, 1]
"bbox" : [x, y, w, h], // optional, coordinates in pixels from the top left image corner
// x, y >= 0 and w, h >= 1
"probabilities" : [p0, p1, ..., pN] // optional, values in [0, 1], sum up to 1.0
}]
The keypoint detection format follows the COCO results documentation. The x and y coordinates represent pixels from the top left corner of the image.
Each keypoint prediction contains the keypoints, an optional bounding box, and optional class probabilities. If the bounding box is omitted, LightlyOne will infer it from the keypoints directly by drawing a tight bounding box around all keypoints (including non-visible ones).
Multi-class Keypoint Detections
LightlyOne supports multi-class keypoint detections with a variable number of keypoints per class. For example, a keypoint prediction could consist of a detection for the class "Person" with 13 keypoints and a detection for a class "Car" with 10 keypoints. Each of the detections is then represented by one keypoint detection singleton.
Semantic Segmentation
For semantic segmentation, please use the following format:
[{
"category_id" : int, // category in [0, num categories - 1]
"segmentation" : [int, int, ...], // run length encoded binary segmentation mask
"score" : float, // prediction score in [0, 1]
"probabilities" : [p0, p1, ..., pN] // optional, values in [0, 1], sum up to 1.0
}]
Each segmentation prediction contains the binary mask for one category and a corresponding score. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the segment belongs to that category.
To kickstart using LightlyOne with semantic segmentation predictions, we created an example script that takes model predictions and converts them to the correct format as follows. Here we provide examples for predictions in NumPy arrays and PyTorch Tensors.
Segmentations are defined with binary masks where each pixel is set to 0 or 1 if it belongs to the background or the object. The segmentation masks are compressed using run length encoding to reduce file size. Binary segmentation masks can be converted to the required format using the following function:
import numpy as np
from numpy.typing import NDArray
from typing import List
def encode(binary_mask: NDArray[np.int_]) -> List[int]:
"""Encodes a (H, W) binary segmentation mask with run length encoding.
The run length encoding is an array with counts of subsequent 0s and 1s
in the binary mask. The first value in the array is always the count of
initial 0s.
Examples:
>>> binary_mask = np.array([
>>> [0, 0, 1, 1],
>>> [0, 1, 1, 1],
>>> [0, 0, 0, 1],
>>> ])
>>> encode(binary_mask)
[2, 2, 1, 3, 3, 1]
"""
assert np.all((binary_mask == 1) | (binary_mask == 0))
flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
borders = np.nonzero(np.diff(flat))[0]
rle = np.diff(borders)
if flat[1]:
rle = np.concatenate(([0], rle))
return rle.tolist()
import numpy as np
from typing import List
import torch
def encode(binary_mask_tensor: torch.Tensor) -> List[int]:
"""Encodes a (H, W) binary segmentation mask with run length encoding.
The run length encoding is an array with counts of subsequent 0s and 1s
in the binary mask. The first value in the array is always the count of
initial 0s.
Note that the shape of the input mask must be (H, W). Other libraries might
give masks in a different shape.
Examples:
>>> binary_mask = torch.tensor([
>>> [0, 0, 1, 1],
>>> [0, 1, 1, 1],
>>> [0, 0, 0, 1],
>>> ], dtype=torch.int)
>>> encode(binary_mask)
[2, 2, 1, 3, 3, 1]
"""
binary_mask = binary_mask_tensor.detach().cpu().numpy()
assert np.all((binary_mask == 1) | (binary_mask == 0))
flat = np.concatenate(([-1], np.ravel(binary_mask), [-1]))
borders = np.nonzero(np.diff(flat))[0]
rle = np.diff(borders)
if flat[1]:
rle = np.concatenate(([0], rle))
return rle.tolist()
The shape of the input mask must be (H, W). Input masks acquired with some libraries, e.g., Tensorflow, might be in a different shape.
Segmentation models often output a probability for each pixel and category. Storing such probabilities can quickly result in large file sizes if the input images have a high resolution. LightlyOne expects only a single score or probability per segmentation to reduce storage requirements. If you have scores or probabilities for each pixel in the image, you must first aggregate them into a single score/probability. We recommend taking the median or mean score/probability over all pixels within the segmentation mask. The example below shows how pixel-wise segmentation predictions can be converted to the format required by Lightly.
import numpy as np
from numpy.typing import NDArray
from typing import List, Dict
PredictionType = Dict[str, Union[int, float, List[int]]]
def convert_to_lightly_predictions(model_predictions: NDArray[np.float_]) -> List[PredictionType]:
"""Converts model predictions to Lightly semantic segmentation predictions.
Shape of `model_predictions`: (N, C, H, W)
- N: number of images
- C: category count
- H: image height
- W: image width
Examples:
>>> images = np.random.randn(3, 4, 5, 6)
>>> convert_to_lightly_predictions(images)
[{'category_id': 0, 'segmentation': [6, 1, 3, ..., 5], 'score': 0.95}, ...]
Args:
model_predictions:
Predictions generated by a model for semantic segmentation.
Returns:
A list of Lightly semantic segmentation predictions.
"""
lightly_predictions: List[PredictionType] = []
for prediction in model_predictions:
prediction_argmax = np.argmax(prediction, axis=0)
for category_id in np.unique(prediction_argmax):
binary_mask = prediction_argmax == category_id
median_score = np.median(prediction[category_id, binary_mask])
lightly_predictions.append(
{
"category_id": int(category_id),
"segmentation": encode(binary_mask),
"score": float(median_score),
}
)
return lightly_predictions
from typing import List, Dict, Union
import torch
import numpy as np
PredictionType = Dict[str, Union[int, float, List[int]]]
def convert_to_lightly_predictions(model_predictions: torch.Tensor) -> List[PredictionType]:
"""Converts model predictions to Lightly semantic segmentation predictions.
Shape of `model_predictions`: (N, C, H, W)
- N: number of images
- C: category count
- H: image height
- W: image width
Examples:
>>> images = torch.randn(3, 4, 5, 6)
>>> convert_to_lightly_predictions(images)
[{'category_id': 0, 'segmentation': [6, 1, 3, ..., 5], 'score': 0.95}, ...]
Args:
model_predictions:
Predictions generated by a model for semantic segmentation.
Returns:
A list of Lightly semantic segmentation predictions.
"""
lightly_predictions: List[PredictionType] = []
for prediction in model_predictions.detach().cpu().numpy():
prediction_argmax = np.argmax(prediction, axis=0)
for category_id in np.unique(prediction_argmax):
binary_mask = prediction_argmax == category_id
median_score = np.median(prediction[category_id, binary_mask])
lightly_predictions.append(
{
"category_id": int(category_id),
"segmentation": encode(binary_mask),
"score": float(median_score),
}
)
return lightly_predictions
Instance Segmentation
For instance segmentation, please use the following format:
[{
"category_id" : int, // category in [0, num categories - 1]
"segmentation" : [int, int, ...] // run length encoded binary segmentation mask
"score" : float, // prediction score in [0, 1]
"bbox" : [x, y, w, h], // coordinates in pixels from the top left image corner
// x, y >= 0 and width, w, h >= 1
"probabilities" : [p0, p1, ..., pN] // optional, values in [0, 1], sum up to 1.0
}]
Each instance segmentation prediction contains the run length encoding (RLE) of the binary mask for the object instance, a bounding box that encloses the object instance, a score, and optional class probabilities. The score determines the likelihood of the segmentation belonging to that category. Optionally, a list of probabilities can be provided containing a probability for each category, indicating the likeliness that the instance belongs to that category.
The bounding box format follows the COCO results documentation, as for object detection, while the segmentation mask follows the format described in semantic segmentation. The segmentation mask must span the full input image, i.e., the size of the decoded segmentation mask must match the size of the input image.
Create Object Detection Prediction Files from COCO
For creating the predictions folder .lightly/predictions
, we recommend writing a script that takes your predictions and saves them in the format just outlined. You can either save the predictions on your local machine and then upload them to your datasource or save them directly to your datasource.
For example, the following script takes an object detection COCO predictions file. It needs the path to the predictions file and the LightlyOne datasource where the .lightly
folder should be created as input. Don't forget to change these two parameters at the top of the script.
import json
import os
from pathlib import Path
### CHANGE THESE PARAMETERS
output_filepath = "/path/to/create/.lightly/dir"
annotation_filepath = "/path/to/_annotations.coco.json"
### Optionally change these parameters
task_name = "my_object_detection_task"
task_type = "object-detection"
# create prediction directory
path_predictions = os.path.join(output_filepath, ".lightly/predictions")
Path(path_predictions).mkdir(exist_ok=True, parents=True)
# Create task.json
path_task_json = os.path.join(path_predictions, "tasks.json")
tasks = [task_name]
with open(path_task_json, "w") as f:
json.dump(tasks, f)
# read coco annotations
with open(annotation_filepath, "r") as f:
coco_dict = json.load(f)
# Create schema.json for task
path_predictions_task = os.path.join(path_predictions, tasks[0])
Path(path_predictions_task).mkdir(exist_ok=True)
schema = {"task_type": task_type, "categories": coco_dict["categories"]}
path_schema_json = os.path.join(path_predictions_task, "schema.json")
with open(path_schema_json, "w") as f:
json.dump(schema, f)
# Create predictions themselves
image_id_to_prediction = dict()
for image in coco_dict["images"]:
prediction = {
"file_name": image["file_name"],
"predictions": [],
}
image_id_to_prediction[image["id"]] = prediction
for ann in coco_dict["annotations"]:
pred = {
"category_id": ann["category_id"],
"bbox": ann["bbox"],
"score": ann.get("score", 0),
}
image_id_to_prediction[ann["image_id"]]["predictions"].append(pred)
for prediction in image_id_to_prediction.values():
filename_prediction = os.path.splitext(prediction["file_name"])[0] + ".json"
path_to_prediction = os.path.join(path_predictions_task, filename_prediction)
with open(path_to_prediction, "w") as f:
json.dump(prediction, f)
Create Prediction Files for Videos
LightlyOne expects one prediction file per frame in a video. Predictions can be created following the Python example code below. Make sure that PyAV is installed on your system for it to work correctly.
import av
import json
from pathlib import Path
from typing import List, Dict
dataset_dir = Path("/datasets/my_dataset")
predictions_dir = dataset_dir / ".lightly" / "predictions" / "my_prediction_task"
def model_predict(frame) -> List[Dict]:
# This function must be overwritten to generate predictions for a frame using
# a prediction model of your choice. Here we just return an example prediction.
# See https://docs.lightly.ai/docker/advanced/datasource_predictions.html#prediction-format
# for possible prediction formats.
return [{"category_id": 0, "bbox": [0, 10, 100, 30], "score": 0.8}]
for video_path in dataset_dir.glob("**/*.mp4"):
# get predictions for frames
predictions = []
with av.open(str(video_path)) as container:
stream = container.streams.video[0]
for frame in container.decode(stream):
predictions.append(model_predict(frame.to_image()))
# save predictions
num_frames = len(predictions)
zero_padding = len(str(num_frames))
for frame_index, frame_predictions in enumerate(predictions):
video_name = video_path.relative_to(dataset_dir).with_suffix("")
frame_name = Path(
f"{video_name}-{frame_index:0{zero_padding}}-{video_path.suffix[1:]}.png"
)
prediction = {
"file_name": str(frame_name),
"predictions": frame_predictions,
}
out_path = predictions_dir / frame_name.with_suffix(".json")
out_path.parent.mkdir(parents=True, exist_ok=True)
with open(out_path, "w") as file:
json.dump(prediction, file)
# example directory structure before
# .
# ├── test
# │ └── video_0.mp4
# └── train
# ├── video_1.mp4
# └── video_2.mp4
#
# example directory structure after
# .
# ├── .lightly
# │ └── predictions
# │ └── my_prediction_task
# │ ├── test
# │ │ ├── video_0-000-mp4.json
# │ │ ├── video_0-001-mp4.json
# │ │ ├── video_0-002-mp4.json
# │ │ └── ...
# │ └── train
# │ ├── video_1-000-mp4.json
# │ ├── video_1-001-mp4.json
# │ ├── video_1-002-mp4.json
# | ├── ...
# | ├── video_2-000-mp4.json
# | ├── video_2-001-mp4.json
# | ├── video_2-002-mp4.json
# │ └── ...
# ├── test
# │ └── video_0.mp4
# └── train
# ├── video_1.mp4
# └── video_2.mp4
It is discouraged to use another library than PyAV for loading videos with Python as the order and number of loaded frames might differ.
Extract Frames with FFmpeg
Alternatively to creating predictions directly from video, frames can first be extracted as images with FFmpeg and then further processed by any prediction model supporting images. The example command below shows how to extract frames and save them with the filename expected by Lightly. Ensure that FFmpeg is installed on your system before running the command.
VIDEO=video.mp4; NUM_FRAMES=$(ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -of csv=p=0 ${VIDEO}); ffmpeg -r 1 -i ${VIDEO} -start_number 0 ${VIDEO%.mp4}-%0${#NUM_FRAMES}d-mp4.png
# results in the following file structure
.
├── video.mp4
├── video-000-mp4.png
├── video-001-mp4.png
├── video-002-mp4.png
├── video-003-mp4.png
└── ...
Updated 12 days ago