Relevant Filenames
Often not all files in a datasource are relevant. In that case, it's possible to pass a list of filenames to the LightlyOne Worker using the relevant_filenames_file
configuration option. It will then only consider the listed filenames and ignore all others. To do so, you can create a text file that contains one relevant filename or a directory per line and then pass the path to the text file when scheduling the run. This works for videos and images.
For example, let's say you're working with the following file structure in your input datasource (in this case, an AWS S3 bucket) where you are only interested in image_1.png
, subdir/image_2.png
and subdir/image_3.png
:
s3://bucket/input/
├── image_1.png
└── subdir/
├── image_2.png
├── image_3.png
├── image_40.png
├── image_41.png
└── image_42.png
Then you can add a file called relevant_filenames.txt
to your Lightly datasource with the following content:
image_1.png
subdir/image_2.png
subdir/image_3.png
The
relevant_filenames_file
is expected to be in the Lightly datasource and must always be located in a subdirectory called.lightly
.Only file paths relative to the input datasource are supported, and relative paths cannot include dot notations
./
or../
.
When using this feature, the Lightly datasource should look like this:
s3://bucket/lightly/
└── .lightly/
└── relevant_filenames.txt
The corresponding Python command to submit a run would then be as follows:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
client.schedule_compute_worker_run(
worker_config={
"relevant_filenames_file": ".lightly/relevant_filenames.txt",
},
selection_config={
"n_samples": 50,
"strategies": [
{
"input": {
"type": "EMBEDDINGS"
},
"strategy": {
"type": "DIVERSITY"
}
}
]
}
)
Select Videos
Selecting video files works the same as selecting image files. Only full videos can be selected or excluded, there is no selection on video frame level possible. For example, let's say you want to process only video_1.mp4
, subdir/video_2.mp4
and subdir/video_3.mp4
. Then create the relevant_filenames.txt
as follows:
video_1.mp4
subdir/video_2.mp4
subdir/video_3.mp4
The location of the relevant filenames file and its usage when scheduling a run are the same as for images. The features below to select directories and to exclude files or directories also work the same for images and videos.
Select Directories
It's also possible to specify a file path prefix by denoting it with an asterisk *
to include whole directories instead of listing many files individually. Everything up until the first *
of a line will be considered as the prefix.
image_1.png
subdir/*
Exclude Files
You can also use the power of the prefix to exclude certain files. To do so, start a line with a prefix (e.g. your/directory/*
) and then add the exclusions separated by spaces:
your/directory/* your/directory/ignore_me/*
You can also remove files from your exclusion with the !
operator. See below for an example.
To understand how to use !
correctly, it's helpful to understand how LightlyOne parses the relevant filenames. LightlyOne uses the following logic:
- Iterate over the file line by line and return relevant filenames.
- If a prefix pattern is encountered:
a. Switch to listing the datasource.
b. List until the prefix is depleted while applying the exclude pattern.
c. Go back to 1.
For example, let's assume the following relevant filenames file which includes images one trough three explicitly, includes all images in foo/bar
and includes all images in foo/baz/
except foo/baz/image_1.jpg
:
foo/image_1.jpg
foo/image_2.jpg
foo/bar/*
foo/baz/* foo/baz/image_1.jpg
foo/image_3.jpg
LightlyOne will first list foo/image_1.png
and foo/image_2.png
. Then it will switch to listing all files in foo/bar/
. When the prefix is depleted, LightlyOne starts listing foo/baz/
. It will list all files and filter out foo/baz/image_1.jpg
. Finally, LightlyOne returns the last item in the list: foo/image_3.png
.
Include Directory Except Subdirectory
Include everything in foo/
except the directory foo/bar
:
foo/* foo/bar/*
Include Directory Except Files by Suffix
Include everything in foo/
except .png
files in foo/
:
foo/* foo/*.png
Include Directory Except Specific Images
Include everything in foo/
except foo/bar/image_1.jpg
and foo/baz/image_1.jpg
foo/* foo/bar/image_1.jpg foo/baz/image_1.jpg
Include Directory but Exclude Images by Prefix
Exclude all images with the prefix foo/image_1
.
foo/* foo/image_1*
Exclude Subdirectory Except a Specific Image
Exclude all images in the subdirectory foo/bar
except for foo/bar/image_1.jpg
foo/* foo/bar/* !foo/bar/image_1.jpg
Updated about 1 month ago