Sequence Selection ================== .. warning:: **The Docker Archive documentation is deprecated** The old workflow described in these docs will not be supported with new Lightly Worker versions above 2.6. Please switch to our `new documentation page `_ instead. Sequence selection allows users to select sequences of a video instead of single frames. The key concept is the parameter `selected_sequence_length`. If its value is one (default), the docker selects single frames. If it is larger than one, each video is split into sequences of that length and the frame representations are aggregated into a sequence representation. The selection then happens on these sequence representations. .. note:: Sequence selection works on videos or on folders of alphabetically sorted frames. How It Works ------------- Sequence selection consists of the following steps: 1. Each input video is split into sequences of length `selected_sequence_length`. 2. Next, the embeddings of all frames in a sequence are aggregated (averaged). 3. The selection is performed on sequence level. 4. Finally, the indices of the selected sequence frames are reconstructed. 5. The report is generated and (if requested) the selected frames are saved. Usage ----------- To select sequences of length **X** simply add the argument **selected_sequence_length=X** to your docker run command. Hereby, **X** must be an integer number which evenly divides the **stopping_condition.n_samples**. If **stopping_condition.n_samples** is a fraction, the Lightly docker will automatically round it to the next multiple of of **X**. For example, let's say we have a folder with two videos which we randomly downloaded from `Pexels `_: .. code-block:: console ls /datasets/pexels/ > Pexels Videos 1409899.mp4 Pexels Videos 2495382.mp4 Now, we want to select sequences of length ten. We use: .. code-block:: console docker run --gpus all --rm -it \ -v /datasets/pexels:/home/input_dir:ro \ -v /outputs/:/home/output_dir \ lightly/worker:latest \ token=MYAWESOMETOKEN \ stopping_condition.n_samples=200 \ enable_corruptness_check=False \ remove_exact_duplicates=False \ dump_dataset=True \ selected_sequence_length=10 The above command will select 20 sequences each consisting of ten frames. The selected frames are then saved in the output directory for further processing. Note that Lightly docker currently doesn't support the corruptness check and removing exact duplicates for sequence selection. Hence we have to deactivate them in the command above. .. warning:: The stopping condition `n_samples` must be equal to to the number of desired sequences times the `selected_sequence_length`, i.e. **n_samples = n_sequences x selected_sequence_length**. In the example above 20 sequences times ten frames is exactly 200. In our example, a look at a PCA of the embeddings of the selected frames nicely shows the 20 selected sequences. The following image is taken from the output of the Lightly docker: .. figure:: images/sequence_selection_pca.png :align: center :alt: PCA of embeddings of frames :figwidth: 80% PCA of the embeddings of the frames in the selected sequences from the two input videos (yellow and purple).