Work with Metadata

Lightly can make use of metadata collected alongside your images or videos. Provided metadata can be used to steer the selection process and to analyze the selected dataset in the Lightly Platform

Metadata Folder Structure

Following, we outline the format in which metadata can be added to a Lightly datasource. Everything regarding metadata takes place in a subdirectory of your configured Lightly datasource called .lightly/metadata. Below we show an example structure of an input datasource containing images and a Lightly datasource containing the metadata files:

s3://bucket/input/
├── image_0.png
└── subdir/
    ├── image_1.png
    ├── image_2.png
    ├── ...
    └── image_N.png

s3://bucket/lightly/
└── .lightly/metadata/
    ├── schema.json
    ├── image_0.json
    └── sudir/
        ├── image_1.json
        ├── ...
        └── image_N.json

All of the .json files are explained in the following sections.

Metadata Schema

The schema defines the metadata format and helps the Lightly Platform and Worker correctly identify and display different types of metadata. You can provide this information to Lightly by adding a schema.json file to the .lightly/metadata directory in your Lightly datasource. The schema.json file must contain a list of configuration entries. Each of the entries is a dictionary with the following keys:

  • name: Name of this metadata item (can be chosen by the user)
  • path: Concatenation of the keys of the metadata item, for example, myObject.myKey1.myKey2.
  • defaultValue: Fallback value if a sample has no corresponding metadata entry
  • valueDataType: Data type, one of the following:
    • CATEGORICAL_STRING
    • CATEGORICAL_BOOLEAN
    • CATEGORICAL_INT
    • NUMERICAL_FLOAT
    • NUMERICAL_INT

For example, let’s say we have additional information about the scene and weather for each image we have collected:

{
    "scene": "Highway",
    "weather": {
        "description": "sunny",
        "temperature": 27.2,
        "air_pressure": 1025
    },
    "vehicle_id": 0
}

A possible schema could look like this:

[
    {
        "name": "Scene",
        "path": "scene",
        "defaultValue": "undefined",
        "valueDataType": "CATEGORICAL_STRING"
    },
    {
        "name": "Weather description",
        "path": "weather.description",
        "defaultValue": "nothing",
        "valueDataType": "CATEGORICAL_STRING"
    },
    {
        "name": "Temperature",
        "path": "weather.temperature",
        "defaultValue": 0.0,
        "valueDataType": "NUMERIC_FLOAT"
    },
    {
        "name": "Air pressure",
        "path": "weather.air_pressure",
        "defaultValue": 0,
        "valueDataType": "NUMERIC_INT"
    },
    {
        "name": "Vehicle ID",
        "path": "vehicle_id",
        "defaultValue": 0,
        "valueDataType": "CATEGORICAL_INT"
    }
]

Metadata Files

Lightly requires a single metadata file per image, video, or frame. Lightly assumes the default value from the schema.json if no metadata file exists for an input file. If a metadata file is provided for a full video, Lightly assumes that the metadata is valid for all frames in that video.

To provide metadata for an image or a video, place a metadata file with the same name as the image or video in the .lightly/metadata directory in the Lightly bucket but change the file extension to .json. The file should contain the metadata in the format defined under Metadata Format (we will get there).

# Filename of the metadata for file in s3://bucket/input/FILENAME.EXT
s3://bucket/lightly/.lightly/metadata/${FILENAME}.json

# Example for image in  s3://bucket/input/subdir/image_1.png
s3://bucket/lightly/.lightly/metadata/subdir/image_1.json

# Example for image in  s3://bucket/input/image_0.png
s3://bucket/lightly/.lightly/metadata/image_0.json

# Example for video in s3://bucket/input/subdir/video_1.mp4
s3://bucket/lightly/.lightly/metadata/subdir/video_1.json

When working with videos, providing a metadata file per frame is possible. Lightly assumes the default value from the schema.json if a frame has no corresponding metadata file. The metadata file name must contain the original video name, video extension, and frame number in the following format:

{VIDEO_NAME}-{FRAME_NUMBER}-{VIDEO_EXTENSION}.json

Frame numbers are zero padded to the total length of the number of frames in a video. A video with 200 frames must have the frame number padded to length three. For example, the frame number for frame 99 becomes 099. A video with 1000 frames must have frame numbers padded to length four (99 becomes 0099).

Examples are shown below:

# Filename of the metadata of the Xth frame of video s3://bucket/input/FILENAME.EXT
# with 200 frames (padding: len(str(200)) = 3)
s3://bucket/lightly/.lightly/metadata/${FILENAME}-${X:03d}-${EXT}.json

# Example
# Video: s3://bucket/input/subdir/video_1.mp4
# Metadata file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/metadata/subdir/video_1-099-mp4.json

# Example
# Video: s3://bucket/input/video_0.mp4
# Metadata file for frame 99 of 200 frames:
s3://bucket/lightly/.lightly/metadata/video_0-099-mp4.json

Metadata Format

Metadata for images and videos require the keys file_name, type, and metadata. Here, file_name serves as a unique identifier to retrieve the original file for which the metadata was collected. type indicates whether the metadata is per image, video, or frame. And metadata contains the actual metadata.

Image

{
    "file_name": "subdir/image_1.png",
    "type": "image",
    "metadata": {
        "scene": "highway",
        "weather": {
            "description": "rainy",
            "temperature": 10.5,
            "air_pressure": 1
        },
        "vehicle_id": 321
    }
}

Video

{
    "file_name": "subdir/video_1.mp4",
    "type": "video",
    "metadata": {
        "scene": "city street",
        "weather": {
            "description": "sunny",
            "temperature": 23.2,
            "air_pressure": 1
        },
        "vehicle_id": 321
    }
}

Frame

{
    "file_name": "subdir/video_1-099-mp4.png",
    "type": "frame",
    "metadata": {
        "scene": "city street",
        "weather": {
            "description": "sunny",
            "temperature": 23.2,
            "air_pressure": 1
        },
        "vehicle_id": 321
    }
}

❗️

file_name should always be the relative path of the image to the root directory of your input bucket.

For example, if the input bucket has s3://bucket/input/ as the root directory and the image is saved at s3://bucket/input/subdir/image_1.png, then file_name should be subdir/image_1.png.

Next Steps

If metadata is provided, the Lightly Worker will automatically detect and load it into the Lightly Platform, where it can be visualized and analyzed after running a selection.

Visualizing the different metadata categories in the Lightly Platform embedding view is also possible. The following example shows the categorical metadata “Scene” from the BDD100k dataset:

13541354

BDD100K embedding visualization in the Lightly Platform.

For information on how to use metadata for selection, head to Customize a Selection and the section on Metadata input.