Sequence Selection

Sequence selection allows users to select sequences of a video instead of single frames. The key concept is the parameter selected_sequence_length. If its value is one (default), the docker selects single frames. If it is larger than one, each video is split into sequences of that length and the frame representations are aggregated into a sequence representation. The selection then happens on these sequence representations.

Note

Sequence selection works on videos or on folders of alphabetically sorted frames.

How It Works

Sequence selection consists of the following steps:

  1. Each input video is split into sequences of length selected_sequence_length.

  2. Next, the embeddings of all frames in a sequence are aggregated (averaged).

  3. The selection is performed on sequence level.

  4. Finally, the indices of the selected sequence frames are reconstructed.

  5. Information about the selected sequences is saved in the output directory.

  6. The report is generated and (if requested) the selected frames are saved.

Usage

To select sequences of length X simply add the argument selected_sequence_length=X to your Lightly Worker run command. Hereby, X must be an integer number which evenly divides the stopping_condition.n_samples. If stopping_condition.n_samples is a fraction, the Lightly Worker will automatically round it to the next multiple of of X.

For example, let’s say we have a folder with two videos which we randomly downloaded from Pexels:

ls /datasets/pexels/
> Pexels_Videos_1409899.mp4  Pexels_Videos_2495382.mp4

Now, we want to select sequences of length ten. We can use the following script:

import json
import lightly
from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset('pexels', dataset_type=DatasetType.VIDEOS)

# Pick one of the following three blocks depending on where your data is
# AWS S3
# Input bucket
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)


# or Google Cloud Storage
# Input bucket
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)


# or Azure Blob Storage
# Input bucket
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the compute run using our custom config.
# We show here the full default config so you can easily edit the
# values according to your needs.
client.schedule_compute_worker_run(
    worker_config={
        'enable_corruptness_check': False,
        'remove_exact_duplicates': False,
        'enable_training': False,
        'pretagging': False,
        'pretagging_debug': False,
        'method': 'coreset',
        'stopping_condition': {
            'n_samples': 200, # select 200 frames of length 10 frames -> 20 sequences
            'min_distance': -1
        },
        'selected_sequence_length': 10 # we want sequences of 10 frames lenght
    },
    lightly_config={
        'loader': {
            'batch_size': 128,
            'shuffle': True,
            'num_workers': -1,
            'drop_last': True
        },
        'model': {
            'name': 'resnet-18',
            'out_dim': 128,
            'num_ftrs': 32,
            'width': 1
        },
        'trainer': {
            'gpus': 1,
            'max_epochs': 1,
            'precision': 16
        },
        'criterion': {
            'temperature': 0.5
        },
        'optimizer': {
            'lr': 1,
            'weight_decay': 0.00001
        },
        'collate': {
            'input_size': 64,
            'cj_prob': 0.8,
            'cj_bright': 0.7,
            'cj_contrast': 0.7,
            'cj_sat': 0.7,
            'cj_hue': 0.2,
            'min_scale': 0.15,
            'random_gray_scale': 0.2,
            'gaussian_blur': 0.0,
            'kernel_size': 0.1,
            'vf_prob': 0,
            'hf_prob': 0.5,
            'rr_prob': 0
        }
    }
)

The above script will create a run to select 20 sequences each consisting of ten frames. The selected frames are then saved in the output directory for further processing. Note that Lightly Worker currently doesn’t support the corruptness check and removing exact duplicates for sequence selection. Hence we have to deactivate them in the command above.

To make sure our run gets processed we need to make sure we have a Lightly Worker running:

docker run--shm-size="1024m" --rm --gpus all -it \
  -v /docker-output:/home/output_dir lightly/worker:latest \
  token=YOUR_TOKEN  worker.worker_id=YOUR_WORKER_ID

Warning

The stopping condition n_samples must be equal to to the number of desired sequences times the selected_sequence_length, i.e. n_samples = n_sequences x selected_sequence_length. In the example above 20 sequences times ten frames is exactly 200.

In our example, a look at a PCA of the embeddings of the selected frames nicely shows the 20 selected sequences. The following image is taken from the output of the Lightly Worker:

PCA of embeddings of frames

PCA of the embeddings of the frames in the selected sequences from the two input videos (yellow and purple).

Sequence Selection Information

The Lightly Worker will create a file at {docker-output}/data/sequence_information.json containing detailed information about the selected sequences. The file can be used for further analysis of your dataset based on sequences.

The file contains a list of sequence dictionaries. Every dicionary lists the exact contents for one sequence. In the case of video frame sequences the sequence_information.json will look similar to the example shown below:

[
    {
        "video_name": "Pexels_Videos_1409899.mp4",
        "frame_names": [
            "Pexels_Videos_1409899-40-mp4.png",
            "Pexels_Videos_1409899-41-mp4.png",
            "Pexels_Videos_1409899-42-mp4.png",
            ...
        ],
        "frame_timestamps_pts": [
            359726680,
            368719847,
            377713014,
            ...
        ],
        "frame_timestamps_sec": [
            4.886145,
            5.008298625,
            5.13045225,
            ...
        ],
        "frame_indices": [
            40,
            41,
            42,
            ...
        ]
    },
    {
        "video_name": "Pexels_Videos_1409899.mp4",
        "frame_names": [
            "Pexels_Videos_1409899-100-mp4.png",
            "Pexels_Videos_1409899-101-mp4.png",
            "Pexels_Videos_1409899-102-mp4.png",
            ...
        ],
        "frame_timestamps_pts": [
            422678849,
            431672016,
            440665183,
            ...
        ],
        "frame_timestamps_sec": [
            6.095856060606061,
            6.217773181818182,
            6.339690303030303,
            ...
        ],
        "frame_indices": [
            100,
            101,
            102,
            ...
        ]
    },
    ...
]

For image file sequences it only lists the filenames for every sequence:

[
    {
        "filenames": [
            "image_40.png",
            "image_41.png",
            "image_42.png",
            ...
        ]
    },
    {
        "filenames": [
            "image_100.png",
            "image_101.png",
            "image_102.png",
            ...
        ]
    },
    ...
]

Cropping Sequences From Videos

Using the timestamps stored in the sequence_information.json file, the selected video sequences can be cropped from the original videos. Make sure that ffmpeg is available on your system for cropping the videos.

There are two types of stored timestamps:

  • frame_timestamps_pts: Presentation timestamps in timebase units of the video.

  • frame_timestamps_sec: Presentation timestamps in seconds.

To crop a sequence, the first and last timestamp from the frame_timestamps_pts list and the video_name stored in the sequence_information.json file are required. The cropping can be done with the following command using an ffmpeg trim filter:

ffmpeg -i {VIDEO_NAME} -copyts -filter "trim=start_pts={FIRST_TIMESTAMP_PTS}:end_pts={LAST_TIMESTAMP_PTS + 1}" {SEQUENCE_NAME}

# example using the videos from above
ffmpeg -i Pexels_Videos_1409899.mp4 -copyts -filter "trim=start_pts=359726680:end_pts=377713015" sequence_1.mp4

Warning

Make sure that end_pts is set to LAST_TIMESTAMP + 1 otherwise the last frame in the sequence will not be included in the cropped video!

Sequences can also be cropped using the first and last timestamp from the frame_timestamps_sec list. However, depending on the video and sequence, this can result in the last frame of the sequence not being included in the cropped video. We recommend to use frame_timestamps_pts if possible. The following command can be used for cropping using frame_timestamps_sec:

ffmpeg -i {VIDEO_NAME} -copyts -filter "trim=start={FIRST_TIMESTAMP_SEC}:end={LAST_TIMESTAMP_SEC}" {SEQUENCE_NAME}

# example using the videos from above
ffmpeg -i Pexels_Videos_1409899.mp4 -copyts -filter "trim=start=4.886145:end=5.985527625" sequence_1.mp4