Sequence Selection

Instead of selecting single video frames, Lightly can also select sequences of frames. The key for doing this is the parameter selected_sequence_length which can be set in the worker_config. If its value is 1 (default), the Lightly Worker selects single frames. If it is larger than one, each video is split into sequences of that length and the frame representations are aggregated into a sequence representation. The selection then happens on these sequence representations.

📘

Sequence selection only works with Videos as Input.

Sequence selection consists of the following steps:

  1. Each input video is split into sequences of length selected_sequence_length.
  2. Next, the embeddings of all frames in a sequence are aggregated (averaged).
  3. The selection is performed on sequence level.
  4. The frames of the selected sequences are uploaded to the Lightly datasource.

The following code snippet shows how to use sequence selection:

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={
        "selected_sequence_length": 10, # Split videos into sequences of 10 frames each
    },
    selection_config={
        "n_samples": 50,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY"
                }
            }
        ]
    },
)

❗️

n_samples specifies the total number of selected frames and must therefore be a multiple of selected_sequence_length! The number of selected sequences is calculated as n_samples / selected_sequence_length. Setting n_samples = 50 and selected_sequence_length = 10 will result in 50 / 10 = 5 selected sequences with each sequence containing 10 frames.

If proportion_samples is used instead of n_samples the number of selected frames is determined as num_frames_in_dataset * proportion_samples rounded down to the next multiple of selected_sequence_length. Given a dataset with 110 frames and setting proportion_samples = 0.5 and selected_sequence_length = 10 will result in floor(110 * 0.5 / 10) = 5 selected sequences with each sequence containing 10 frames.

🚧

DIVERSITY is currently the only supported selection strategy for sequence selection.

Crop Sequences from Videos

Sometimes it is useful to get the selected sequences not as extracted video frames but as video clips instead. Lightly does not yet support extracting video clips directly but instead creates a sequence_information.json file which stores the exact timestamps of the selected sequences from each video. Using this file it is possible to crop the selected sequences from the original videos.

Sequence Information

The sequence_information.json file contains a list of dictionaries with each dictionary containing the information of a selected sequence:

[
    {
        "video_name": "video1.mp4",
        "frame_names": [
            "video1-40-mp4.png",
            "video1-41-mp4.png",
            "video1-42-mp4.png",
            ...
        ],
        "frame_timestamps_pts": [
            359726680,
            368719847,
            377713014,
            ...
        ],
        "frame_timestamps_sec": [
            4.886145,
            5.008298625,
            5.13045225,
            ...
        ],
        "frame_indices": [
            40,
            41,
            42,
            ...
        ]
    },
    {
        "video_name": "video1.mp4",
        "frame_names": [
            "video1-100-mp4.png",
            "video1-101-mp4.png",
            "video1-102-mp4.png",
            ...
        ],
        "frame_timestamps_pts": [
            422678849,
            431672016,
            440665183,
            ...
        ],
        "frame_timestamps_sec": [
            6.095856060606061,
            6.217773181818182,
            6.339690303030303,
            ...
        ],
        "frame_indices": [
            100,
            101,
            102,
            ...
        ]
    },
    ...
]

Each sequence dictionary has the following fields:

  • video_name is the original video filename from which the sequence was created.
  • frame_names lists the filenames of the selected frames in the sequence.
  • frame_timestamps_pts lists the presentation timestamps of the selected frames in the sequence.
  • frame_timestamps_sec lists the timestamps in seconds since the beginning of the video for the selected frames in the sequence.
  • frame_indices lists the frame indices (starting at 0) since the beginning of the video for the selected frames in the sequence.

The sequence_information.json file can be accessed in the output directory ({OUTPUT_DIR}) of the Lightly Worker docker container. To access the output directory it is necessary to mount the directory when starting the Lightly Worker:

docker run --shm-size="1024m" --gpus all --rm -it \
    -v {OUTPUT_DIR}:/output_dir \
    -e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
    lightly/worker:latest \
    worker.worker_id={MY_WORKER_ID}

Every Lightly Worker run will now save a sequence_information.json in the output directory at {OUTPUT_DIR}/{DATE}/{TIME}/data/sequence_information.json.

Crop the Sequences

Using the timestamps stored in the sequence_information.json file, the selected video sequences can be cropped from the original videos. Make sure that FFmpeg is available on your system for cropping the videos.

To crop a sequence, the first and last timestamp from the frame_timestamps_pts list and the video_name stored in the sequence_information.json file are required. The cropping can be done with the following command using an FFmpeg trim filter:

ffmpeg -i {VIDEO_NAME} -copyts -filter "trim=start_pts={FIRST_TIMESTAMP_PTS}:end_pts={LAST_TIMESTAMP_PTS + 1}" {SEQUENCE_NAME}

# example using an mp4 video
ffmpeg -i video.mp4 -copyts -filter "trim=start_pts=359726680:end_pts=377713015" sequence_1.mp4

🚧

Make sure that end_pts is set to LAST_TIMESTAMP + 1 otherwise the last frame in the sequence will not be included in the cropped sequence!

Sequences can also be cropped using the first and last timestamp from the frame_timestamps_sec list. However, depending on the video and sequence, this can result in the last frame of the sequence not being included in the cropped video. We recommend to use frame_timestamps_pts if possible. The following command can be used for cropping using frame_timestamps_sec:

ffmpeg -i {VIDEO_NAME} -copyts -filter "trim=start={FIRST_TIMESTAMP_SEC}:end={LAST_TIMESTAMP_SEC}" {SEQUENCE_NAME}

# example using an mp4 video
ffmpeg -i video.mp4 -copyts -filter "trim=start=4.886145:end=5.985527625" sequence_1.mp4

Sequence Selection with Crop Selection

Sequence selection can be combined with crop selection to select sequences of video frames based on the object crops within them. To run sequence and crop selection at the same time, you can use the following code example:

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={
        "selected_sequence_length": 10, # Split videos into sequences of 10 frames each
        "object_level": {
            "task_name": "my-object-detection-task", # Use crops from this task for the selection
        },
    },
    selection_config={
        "n_samples": 50,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY"
                }
            }
        ]
    },
)

Compared to normal sequence selection, the selection process works a bit differently when combined with crop selection. With this setting enabled, the Lightly Worker runs the following steps:

  1. Create embeddings for all crops in all video frames.
  2. Select n_samples / selected_sequence_length diverse crops.
  3. Split the input videos into frame sequences.
  4. Select all sequences which contain at least one of the selected crops.
  5. Upload all crops from all selected sequences. This includes crops that were not initially selected by the crop selection strategy in step 2.
  6. Upload all video frames that contain at least one selected crop.

The selected video frames are uploaded to the dataset from which the run is started. This is the dataset with the id "MY_DATASET_ID" in the example above. The selected object crops are uploaded to a new dataset. This new crop dataset is named after the dataset with the video frames and the crop task name. The dataset name has the following format: {VIDEO_DATASET_NAME}-crops-{TASK_NAME}.

🚧

Lightly might select less than n_samples video frames

Because the selection is performed on crops, less than n_samples video frames might be uploaded at the end of the run. This can happen if multiple crops from the same video frame or sequence are selected in step 2 of the algorithm presented above. As every video frame is selected at most once, the total number of selected frames can therefore decrease.

🚧

Video frames without crops are never selected

Lightly only selects and uploads video frames that contain at least one crop. If you need all frames of the selected sequences, including frames without crops, you can extract the full sequences using the sequence information file.