Sequence Selection

Sequence selection allows users to select sequences of a video instead of single frames. The key concept is the parameter selected_sequence_length. If its value is one (default), the docker selects single frames. If it is larger than one, each video is split into sequences of that length and the frame representations are aggregated into a sequence representation. The selection then happens on these sequence representations.

Note

Sequence selection works on videos or on folders of alphabetically sorted frames.

How It Works

Sequence selection consists of the following steps:

  1. Each input video is split into sequences of length selected_sequence_length.

  2. Next, the embeddings of all frames in a sequence are aggregated (averaged).

  3. The selection is performed on sequence level.

  4. Finally, the indices of the selected sequence frames are reconstructed.

  5. The report is generated and (if requested) the selected frames are saved.

Usage

To select sequences of length X simply add the argument selected_sequence_length=X to your Lightly Worker run command. Hereby, X must be an integer number which evenly divides the stopping_condition.n_samples. If stopping_condition.n_samples is a fraction, the Lightly Worker will automatically round it to the next multiple of of X.

For example, let’s say we have a folder with two videos which we randomly downloaded from Pexels:

ls /datasets/pexels/
> Pexels Videos 1409899.mp4  Pexels Videos 2495382.mp4

Now, we want to select sequences of length ten. We can use the following script:

import json
import lightly
from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset('pexels', dataset_type=DatasetType.VIDEOS)

# Pick one of the following three blocks depending on where your data is
# AWS S3
# Input bucket
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)


# or Google Cloud Storage
# Input bucket
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)


# or Azure Blob Storage
# Input bucket
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the compute run using our custom config.
# We show here the full default config so you can easily edit the
# values according to your needs.
client.schedule_compute_worker_run(
    worker_config={
        'enable_corruptness_check': False,
        'remove_exact_duplicates': False,
        'enable_training': False,
        'pretagging': False,
        'pretagging_debug': False,
        'method': 'coreset',
        'stopping_condition': {
            'n_samples': 200, # select 200 frames of length 10 frames -> 20 sequences
            'min_distance': -1
        },
        'selected_sequence_length': 10 # we want sequences of 10 frames lenght
    },
    lightly_config={
        'loader': {
            'batch_size': 128,
            'shuffle': True,
            'num_workers': -1,
            'drop_last': True
        },
        'model': {
            'name': 'resnet-18',
            'out_dim': 128,
            'num_ftrs': 32,
            'width': 1
        },
        'trainer': {
            'gpus': 1,
            'max_epochs': 1,
            'precision': 16
        },
        'criterion': {
            'temperature': 0.5
        },
        'optimizer': {
            'lr': 1,
            'weight_decay': 0.00001
        },
        'collate': {
            'input_size': 64,
            'cj_prob': 0.8,
            'cj_bright': 0.7,
            'cj_contrast': 0.7,
            'cj_sat': 0.7,
            'cj_hue': 0.2,
            'min_scale': 0.15,
            'random_gray_scale': 0.2,
            'gaussian_blur': 0.0,
            'kernel_size': 0.1,
            'vf_prob': 0,
            'hf_prob': 0.5,
            'rr_prob': 0
        }
    }
)

The above script will create a run to select 20 sequences each consisting of ten frames. The selected frames are then saved in the output directory for further processing. Note that Lightly Worker currently doesn’t support the corruptness check and removing exact duplicates for sequence selection. Hence we have to deactivate them in the command above.

To make sure our run gets processed we need to make sure we have a Lightly Worker running:

docker run --rm --gpus all -it \
  -v /docker-output:/home/output_dir lightly/worker:latest \
  token=YOUR_TOKEN  worker.worker_id=YOUR_WORKER_ID

Warning

The stopping condition n_samples must be equal to to the number of desired sequences times the selected_sequence_length, i.e. n_samples = n_sequences x selected_sequence_length. In the example above 20 sequences times ten frames is exactly 200.

In our example, a look at a PCA of the embeddings of the selected frames nicely shows the 20 selected sequences. The following image is taken from the output of the Lightly Worker:

PCA of embeddings of frames

PCA of the embeddings of the frames in the selected sequences from the two input videos (yellow and purple).