Export Filenames
LightlyOne supports many different export formats, so you can easily export selected images to your favorite labeling tool. Here, we will show how you can download the filenames and full images to your machine again.
Exporting Filenames to a Text File
After a run successfully terminates, the LightlyOne Worker will have created a tag with the name initial-tag
in your dataset. You can export and download the filenames for further processing with the following Python code:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
filenames = client.export_filenames_by_tag_name(
tag_name="initial-tag" # name of the tag in the dataset
)
with open("filenames-of-initial-tag.txt", "w") as f:
f.write(filenames)
Exporting Filenames from Multiple Tags
Whenever the LightlyOne Worker adds data to a dataset, it creates a new tag for it. To get all the tags and filenames, you can use the following code:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
# Get all the tags for this dataset
tags = client.get_all_tags()
# Loop over tags and export the filenames
# Note: first tag in `tags` is the newest one
# and the last one is always `initial-tag`
for tag in tags:
print(tag.name)
filenames = client.export_filenames_by_tag_name(
tag_name=tag.name # name of the tag in the dataset
)
with open(f"filenames-of-{tag.name}.txt", "w") as f:
f.write(filenames)
Filenames and Signed Read URLs
The LightlyOne Python Client also allows you to easily export filenames together with signed read URLs. This allows other tools to load the data directly from your bucket.
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
# get the filenames with signed read URLs
filenames_and_read_urls = client.export_filenames_and_read_urls_by_tag_name(
tag_name="initial-tag" # name of the tag in the dataset
)
with open("filenames-and-readurls-of-initial-tag.json", "w") as f:
json.dump(filenames_and_read_urls, f)
Filenames from Last Run
Sometimes you might only be interested in the filenames added to your dataset by the latest LightlyOne Worker run. For example, when you already have a dataset with labeled images and only need to label newly added images. In that case, you can use the tag created by the last run to identify the new images.
The following code example only works for runs made with LightlyOne Worker version
>=2.4.2
. For older versions see further below.
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
tags = client.get_compute_worker_run_tags(run_id="MY_RUN_ID")
# If you only have the scheduled id, you can get the run id as follows:
# run_id = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id="MY_SCHEDULED_RUN_ID").id
filenames = client.export_filenames_by_tag_name(tag_name=tags[0])
with open("filenames-of-last-run.txt", "w") as f:
f.write(filenames)
Runs made with LightlyOne Worker version <2.4.2
do not include the run id in the tags. In that case, you can download all tags from the dataset and sort them by creation date to get the last created tag:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
tags = client.get_all_tags()
last_tag = sorted(tags, key=lambda tag: tag.created_at)[-1]
filenames = client.export_filenames_by_tag_name(tag_name=last_tag.name)
with open("filenames-of-last-run.txt", "w") as f:
f.write(filenames)
Downloading the Full Images
It is also possible to directly download the actual files themselves and store them on disk.
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
client.download_dataset(
output_dir="./my/output/path/", # path to where the files should be saved
tag_name="initial-tag", # name of the tag in the dataset
)
Updated about 2 months ago