lightly.api

The lightly.api module provides access to the Lightly API.

class lightly.api.api_workflow_client.ApiWorkflowClient(token: Optional[str] = None, dataset_id: Optional[str] = None, embedding_id: Optional[str] = None, creator: str = Creator.USER_PIP)

Provides a uniform interface to communicate with the Lightly API.

The APIWorkflowClient is used to communicate with the Lightly API. The client can run also more complex workflows which include multiple API calls at once.

The client can be used in combination with the active learning agent.

Parameters

token – The token of the user. If it is not passed in during initialization, the token will be read from the environment variable LIGHTLY_TOKEN. For further information on how to get a token, see: https://docs.lightly.ai/docs/install-lightly#api-token
dataset_id – The id of the dataset. If it is not set, but used by a workflow, the last modfied dataset is taken by default.
embedding_id – The id of the embedding to use. If it is not set, but used by a workflow, the newest embedding is taken by default
creator – Creator passed to API requests.

compute_worker_run_info_generator(scheduled_run_id: str) → Iterator[ComputeWorkerRunInfo]

Pulls information about a Lightly Worker run continuously.

Polls the Lightly Worker status every 30s. If the status changed, an update pops up. If the Lightly Worker run finished, the generator stops.

Parameters: scheduled_run_id – The id with which the run was scheduled.
Returns: Generator of information about the Lightly Worker run status.

Examples

>>> # Scheduled a Lightly Worker run and monitor its state
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id):
>>>     print(f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'")
>>>

create_dataset(dataset_name: str, dataset_type: str = DatasetType.IMAGES) → None

Creates a dataset on the Lightly Platform.

The dataset_id of the created dataset is stored in the client.dataset_id attribute and all further requests with the client will use the created dataset by default.

Parameters

dataset_name – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

Raises

ValueError – If a dataset with dataset_name already exists.

Examples

>>> from lightly.api import ApiWorkflowClient
>>> from lightly.openapi_generated.swagger_client.models import DatasetType
>>>
>>> client = ApiWorkflowClient(token="YOUR_TOKEN")
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.IMAGES)
>>>
>>> # or to work with videos
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.VIDEOS)
>>>
>>> # retrieving dataset_id of the created dataset
>>> dataset_id = client.dataset_id
>>>
>>> # future client requests use the created dataset by default
>>> client.dataset_type
'Videos'

create_new_dataset_with_unique_name(dataset_basename: str, dataset_type: str = DatasetType.IMAGES) → None

Creates a new dataset on the Lightly Platform.

If a dataset with the specified name already exists, the name is suffixed by a counter value.

Parameters

dataset_basename – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Create a dataset with a brand new name.
>>> client.create_new_dataset_with_unique_name("new-dataset")
>>> client.get_dataset_by_id(client.dataset_id)
{'id': '6470abef4f0eb7e635c30954',
 'name': 'new-dataset',
 ...}
>>>
>>> # Create another dataset with the same name. This time, the
>>> # new dataset should have a suffix `_1`.
>>> client.create_new_dataset_with_unique_name("new-dataset")
>>> client.get_dataset_by_id(client.dataset_id)
{'id': '6470ac194f0eb7e635c30990',
 'name': 'new-dataset_1',
 ...}

create_tag_from_filenames(fnames_new_tag: List[str], new_tag_name: str, parent_tag_id: Optional[str] = None) → TagData

Creates a new tag from a list of filenames.

Parameters

fnames_new_tag – A list of filenames to be included in the new tag.
new_tag_name – The name of the new tag.
parent_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.

Returns

The newly created tag.

Raises

RuntimeError – When a tag with the desired tag name already exists. When initial-tag does not exist. When any of the given files does not exist.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')
{'id': '6470c4c1060894655c5a8ed5'}

dataset_exists(dataset_id: str) → bool

Checks if a dataset exists.

Parameters: dataset_id – Dataset ID.
Returns: True if the dataset exists and False otherwise.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.dataset_exists(dataset_id=dataset_id)
True

property dataset_id: str

The current dataset ID.

Future requests with the client will automatically use this dataset ID. If the dataset ID is set, it is returned. Otherwise, the ID of the last modified dataset is selected.

dataset_name_exists(dataset_name: str, shared: Optional[bool] = False) → bool

Checks if a dataset with the given name exists.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are checked.

Parameters

dataset_name – Name of the dataset.
shared –
- If False (default), checks only datasets owned by the user.
- If True, checks datasets which have been shared with the user,
including team datasets. Excludes user’s own datasets. * If None, checks all datasets the users has access to.

Returns

A boolean value indicating whether any dataset with the given name exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.dataset_name_exists(dataset_name="your-dataset-name")
True

property dataset_type: str: Returns the dataset type of the current dataset.

delete_compute_worker(worker_id: str) → None

Removes a Lightly Worker.

Parameters: worker_id – ID of the worker to be removed.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_ids = client.get_compute_worker_ids()
>>> worker_ids
['64709eac61e9ce68180a6529']
>>> client.delete_compute_worker(worker_id="64709eac61e9ce68180a6529")
>>> client.get_compute_worker_ids()
[]

delete_dataset_by_id(dataset_id: str) → None

Deletes a dataset on the Lightly Platform.

Parameters: dataset_id – The ID of the dataset to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.dataset_exists(dataset_id=dataset_id)
True
>>>
>>> # Delete the dataset
>>> client.delete_dataset_by_id(dataset_id=dataset_id)
>>> client.dataset_exists(dataset_id=dataset_id)
False

delete_tag_by_id(tag_id: str) → None

Deletes a tag from the current dataset.

Parameters: tag_id – The id of the tag to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> tag_id = client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')["id"]
>>> client.delete_tag_by_id(tag_id=tag_id)

delete_tag_by_name(tag_name: str) → None

Deletes a tag from the current dataset.

Parameters: tag_name – The name of the tag to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')
>>> client.delete_tag_by_name(tag_name="new-tag")

download_compute_worker_run_artifacts(run: DockerRunData, output_dir: str, timeout: int = 60) → None

Downloads all artifacts from a run.

Parameters

run – Run from which to download artifacts.
output_dir – Output directory where artifacts will be saved.
timeout – Timeout in seconds after which an artifact download is interrupted.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download artifacts
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_artifacts(run=run, output_dir="my_run/artifacts")

download_compute_worker_run_checkpoint(run: DockerRunData, output_path: str, timeout: int = 60) → None

Downloads the last training checkpoint from a run.

See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints

Parameters

run – Run from which to download the checkpoint.
output_path – Path where checkpoint will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_checkpoint(run=run, output_path="my_checkpoint.ckpt")

download_compute_worker_run_corruptness_check_information(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the corruptness check information file from a run.

Parameters

run – Run from which to download the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no corruptness check information artifact or the file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download corruptness check information file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_corruptness_check_information(run=run, output_path="corruptness_check_information.json")
>>>
>>> # print all corrupt samples and corruptions
>>> with open("corruptness_check_information.json", 'r') as f:
>>>     corruptness_check_information = json.load(f)
>>> for sample_name, error in corruptness_check_information["corrupt_samples"].items():
>>>     print(f"Sample '{sample_name}' is corrupt because of the error '{error}'.")

download_compute_worker_run_log(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the log file from a run.

Parameters

run – Run from which to download the log file.
output_path – Path where log file will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no log artifact or the log file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download log file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_log(run=run, output_path="log.txt")

download_compute_worker_run_memory_log(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the memory consumption log file from a run.

Parameters

run – Run from which to download the memory log file.
output_path – Path where memory log file will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no memory log artifact or the memory log file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download memory log file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_memory_log(run=run, output_path="memlog.txt")

download_compute_worker_run_report_json(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the report in json format from a run.

DEPRECATED: This method is deprecated and will be removed in the future. Use download_compute_worker_run_report_v2_json to download the new report_v2.json instead.

Parameters

run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_json(run=run, output_path="report.json")

download_compute_worker_run_report_pdf(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the report in pdf format from a run.

Parameters

run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download report
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_pdf(run=run, output_path="report.pdf")

download_compute_worker_run_report_v2_json(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the report in json format from a run.

Parameters

run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_v2_json(run=run, output_path="report_v2.json")

download_compute_worker_run_sequence_information(run: DockerRunData, output_path: str, timeout: int = 60) → None

Download the sequence information from a run.

Parameters

run – Run from which to download the the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no sequence information artifact or the file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download sequence information file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_sequence_information(run=run, output_path="sequence_information.json")

download_dataset(output_dir: str, tag_name: str = 'initial-tag', max_workers: int = 8, verbose: bool = True) → None

Downloads images from the web-app and stores them in output_dir.

Parameters

output_dir – Where to store the downloaded images.
tag_name – Name of the tag which should be downloaded.
max_workers – Maximum number of workers downloading images in parallel.
verbose – Whether or not to show the progress bar.

Raises

ValueError – If the specified tag does not exist on the dataset.
RuntimeError – If the connection to the server failed.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_dataset("/tmp/data")
Downloading 3 images (with 3 workers):
100%|██████████████████████████████████| 3/3 [00:01<00:00,  1.99imgs/s]

download_embeddings_csv(output_path: str) → None

Downloads the latest embeddings from the dataset.

Parameters: output_path – Where the downloaded embedding data should be stored.
Raises: RuntimeError – If no embeddings could be found for the dataset.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_embeddings_csv(output_path="/tmp/embeddings.csv")
>>>
>>> # File content:
>>> # filenames,embedding_0,embedding_1,embedding_...,labels
>>> # image-1.png,0.2124302,-0.26934767,...,0

download_embeddings_csv_by_id(embedding_id: str, output_path: str) → None

Downloads embeddings with the given embedding id from the dataset.

Parameters

embedding_id – ID of the embedding data to be downloaded.
output_path – Where the downloaded embedding data should be stored.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_embeddings_csv_by_id(
...     embedding_id="646f346004d77b4e1424e67e",
...     output_path="/tmp/embeddings.csv"
... )
>>>
>>> # File content:
>>> # filenames,embedding_0,embedding_1,embedding_...,labels
>>> # image-1.png,0.2124302,-0.26934767,...,0

download_new_raw_samples(divide_and_conquer_shards: int = 1, use_redirected_read_url: bool = False) → List[Tuple[str, str]]

Downloads filenames and read urls of unprocessed samples from the datasource.

All samples after the timestamp of ApiWorkflowClient.get_processed_until_timestamp() are fetched. After downloading the samples, the timestamp is updated to the current time. This function can be repeatedly called to retrieve new samples from the datasource.

Parameters

divide_and_conquer_shards – Number of shards to use for divide and conquer listing. Typically num_workers/cpu_count.
use_redirected_read_url – Flag for redirected read urls. When this flag is true, RedirectedReadUrls are returned instead of ReadUrls, meaning that the returned URLs have unlimited access to the file. Defaults to False. When S3DelegatedAccess is configured, this flag has no effect because RedirectedReadUrls are always returned.

Returns

A list of (filename, url) tuples where each tuple represents a sample.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_new_raw_samples()
[('image-3.png', 'https://......'), ('image-4.png', 'https://......')]

export_filenames_and_read_urls_by_tag_id(tag_id: str) → List[Dict[str, str]]

Fetches filenames, read URLs, and datasource URLs from the given tag.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters

tag_id – ID of the tag which should exported.

Returns

A list of dictionaries with the keys “filename”, “readUrl” and “datasourceUrl”. An example: [

{
“fileName”: “sample1.jpg”, “readUrl”: “s3://my_datasource/sample1.jpg?read_url_key=EAIFUIENDLFN”, “datasourceUrl”: “s3://my_datasource/sample1.jpg”,

}, {

”fileName”: “sample2.jpg”, “readUrl”: “s3://my_datasource/sample2.jpg?read_url_key=JSBFIEUHVSJ”, “datasourceUrl”: “s3://my_datasource/sample2.jpg”,

},

]

export_filenames_and_read_urls_by_tag_name(tag_name: str) → List[Dict[str, str]]

Fetches filenames, read URLs, and datasource URLs from the given tag name.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters: tag_name – Name of the tag which should exported.
Returns: A list of dictionaries with keys “filename”, “readUrl” and “datasourceUrl”.

Examples

>>> # write json file which can be used to access the actual file contents.
>>> mappings = client.export_filenames_and_read_urls_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-samples.json', 'w') as f:
>>>     json.dump(mappings, f)

export_filenames_by_tag_id(tag_id: str) → str

Fetches samples filenames within a certain tag by tag ID.

More information: https://docs.lightly.ai/docs/filenames-and-readurls
Args:

tag_id:
ID of the tag which should exported.

Returns:
A list of filenames of samples within a certain tag.

Examples:
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_filenames_by_tag_id("646b40d6c06aae1b91294a9e")
'image-1.jpg

image-2.jpg image-3.jpg’

export_filenames_by_tag_name(tag_name: str) → str

Fetches samples filenames within a certain tag by tag name.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters: tag_name – Name of the tag which should exported.
Returns: A list of filenames of samples within a certain tag.

Examples

>>> # write json file which can be imported in Label Studio
>>> filenames = client.export_filenames_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('filenames-of-initial-tag.txt', 'w') as f:
>>>     f.write(filenames)

export_label_box_data_rows_by_tag_id(tag_id: str) → List[Dict]

Fetches samples in a format compatible with Labelbox v3.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters: tag_id – ID of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Labelbox v3.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_box_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'externalId': '2218961434_7916358f53_z.jpg', 'imageUrl': ...}]

export_label_box_data_rows_by_tag_name(tag_name: str) → List[Dict]

Fetches samples in a format compatible with Labelbox v3.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters: tag_name – Name of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Labelbox v3.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_box_data_rows_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-labelbox-rows.json', 'w') as f:
>>>     json.dump(tasks, f)

export_label_box_v4_data_rows_by_tag_id(tag_id: str) → List[Dict]

Fetches samples in a format compatible with Labelbox v4.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters: tag_id – ID of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Labelbox v4.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_box_v4_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'row_data': '...', 'global_key': 'image-1.jpg', 'media_type': 'IMAGE'}

export_label_box_v4_data_rows_by_tag_name(tag_name: str) → List[Dict]

Fetches samples in a format compatible with Labelbox.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters: tag_name – Name of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Labelbox.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_box_v4_data_rows_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-labelbox-rows.json', 'w') as f:
>>>     json.dump(tasks, f)

export_label_studio_tasks_by_tag_id(tag_id: str) → List[Dict]

Exports samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

Parameters: tag_id – Id of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Label Studio.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_studio_tasks_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'id': 0, 'data': {'image': '...', ...}}]

export_label_studio_tasks_by_tag_name(tag_name: str) → List[Dict]

Fetches samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

More information: https://docs.lightly.ai/docs/labelstudio-integration

Parameters: tag_name – Name of the tag which should exported.
Returns: A list of dictionaries in a format compatible with Label Studio.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_studio_tasks_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-label-studio-tasks.json', 'w') as f:
>>>     json.dump(tasks, f)

get_all_datasets() → List[DatasetData]

Returns all datasets the user has access to.

DEPRECATED in favour of get_datasets(shared=None) and will be removed in the future.

get_all_embedding_data() → List[DatasetEmbeddingData]

Fetches embedding data of all embeddings for this dataset.

Returns: A list of embedding data.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_all_embedding_data()
[{'created_at': 1684750552181,
 'id': '646b40d88355e2f54c6d2235',
 'is2d': False,
 'is_processed': True,
 'name': 'default_20230522_10h15m50s'}]

get_all_tags() → List[TagData]

Gets all tags in the Lightly Platform from the current dataset.

Returns: A list of tags.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_all_tags()
[{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}]

get_compute_worker_ids() → List[str]

Fetches the IDs of all registered Lightly Workers.

Returns: A list of worker IDs.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_ids = client.get_compute_worker_ids()
>>> worker_ids
['64709eac61e9ce68180a6529', '64709f8f61e9ce68180a652a']

get_compute_worker_run(run_id: str) → DockerRunData

Fetches a Lightly Worker run.

Parameters: run_id – Run ID.
Returns: Details of the Lightly Worker run.
Raises: ApiException – If no run with the given ID exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_run(run_id="6470a20461e9ce68180a6530")
{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
 }

get_compute_worker_run_checkpoint_url(run: DockerRunData) → str

Gets the download url of the last training checkpoint from a run.

See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints

Parameters: run – Run from which to download the checkpoint.
Returns: The url from which the checkpoint can be downloaded.
Raises: ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # get checkpoint read_url
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> checkpoint_read_url = client.get_compute_worker_run_checkpoint_url(run=run)

get_compute_worker_run_from_scheduled_run(scheduled_run_id: str) → DockerRunData

Fetches a Lightly Worker run given its scheduled run ID.

Parameters: scheduled_run_id – Scheduled run ID.
Returns: Details of the Lightly Worker run.
Raises: ApiException – If no run with the given scheduled run ID exists or if the scheduled run is not yet picked up by a worker.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_run_from_scheduled_run(scheduled_run_id="646f338a8a5613b57d8b73a1")
{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
}

get_compute_worker_run_info(scheduled_run_id: str) → ComputeWorkerRunInfo

Returns information about the Lightly Worker run.

Parameters: scheduled_run_id – ID of the scheduled run.
Returns: Details of the Lightly Worker run.

Examples

>>> # Scheduled a Lightly Worker run and get its state
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>> run_info = client.get_compute_worker_run_info(scheduled_run_id)
>>> print(run_info)

get_compute_worker_run_tags(run_id: str) → List[TagData]

Returns all tags from a run with the current dataset.

Only returns tags for runs made with Lightly Worker version >=2.4.2.

Parameters: run_id – Run ID from which to return tags.
Returns: List of tags created by the run. The tags are ordered by creation date from newest to oldest.

Examples

>>> # Get filenames from last run.
>>>
>>> from lightly.api import ApiWorkflowClient
>>> client = ApiWorkflowClient(
>>>     token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID"
>>> )
>>> tags = client.get_compute_worker_run_tags(run_id="MY_LAST_RUN_ID")
>>> filenames = client.export_filenames_by_tag_name(tag_name=tags[0].name)

get_compute_worker_runs(dataset_id: Optional[str] = None) → List[DockerRunData]

Fetches all Lightly Worker runs for the user.

Parameters: dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.
Returns: Runs sorted by creation time from the oldest to the latest.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_runs()
[{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
 }]

get_compute_worker_runs_iter(dataset_id: Optional[str] = None) → Iterator[DockerRunData]

Returns an iterator over all Lightly Worker runs for the user.

Parameters: dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.
Returns: Runs iterator.

get_compute_workers() → List[DockerWorkerRegistryEntryData]

Fetches details of all registered Lightly Workers.

Returns: A list of Lightly Worker details.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> workers = client.get_compute_workers()
>>> workers
[{'created_at': 1685102336056,
    'docker_version': '2.6.0',
    'id': '64709eac61e9ce68180a6529',
    'labels': [],
    ...
}]

get_dataset_by_id(dataset_id: str) → DatasetData

Fetches a dataset by ID.

Parameters: dataset_id – Dataset ID.
Returns: The dataset with the given dataset id.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.get_dataset_by_id(dataset_id=dataset_id)
{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}

get_datasets(shared: Optional[bool] = False) → List[DatasetData]

Returns all datasets owned by the current user.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters

shared –

If False (default), returns only datasets owned by the user. In this

case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

A list of datasets owned by the current user.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.get_datasets()
[{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}]

get_datasets_by_name(dataset_name: str, shared: Optional[bool] = False) → List[DatasetData]

Fetches datasets by name.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters

dataset_name – Name of the target dataset.
shared –
- If False (default), returns only datasets owned by the user. In this
case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

A list of datasets that match the name. If no datasets with the name exist, an empty list is returned.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.get_datasets_by_name(dataset_name="your-dataset-name")
[{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}]
>>>
>>> # Non-existent dataset
>>> client.get_datasets_by_name(dataset_name="random-name")
[]

get_datasets_iter(shared: Optional[bool] = False) → Iterator[DatasetData]

Returns an iterator over all datasets owned by the current user.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters

shared –

If False (default), returns only datasets owned by the user. In this

case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

An iterator over datasets owned by the current user.

get_datasource() → DatasourceConfig

Returns the datasource of the current dataset.

Returns: Datasource data of the datasource of the current dataset.
Raises: ApiException if no datasource was configured. –

get_embedding_by_name(name: str, ignore_suffix: bool = True) → DatasetEmbeddingData

Fetches an embedding in the current dataset by name.

Parameters

name – The name of the desired embedding.
ignore_suffix – If true, a suffix of the embedding name in the current dataset is ignored.

Returns

The embedding data.

Raises

EmbeddingDoesNotExistError – If the name does not match the name of an embedding on the server.

get_embedding_data_by_name(name: str) → DatasetEmbeddingData

Fetches embedding data with the given name for this dataset.

Parameters: name – Embedding name.
Returns: Embedding data.
Raises: ValueError – If no embedding with this name exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_embedding_data_by_name("embedding-data")
[{'created_at': 1654756552401,
 'id': '646f346004d77b4e1424e67e',
 'is2d': False,
 'is_processed': True,
 'name': 'embedding-data'}]

get_scheduled_compute_worker_runs(state: Optional[str] = None) → List[DockerRunScheduledData]

Returns a list of scheduled Lightly Worker runs with the current dataset.

Parameters: state – DockerRunScheduledState value. If specified, then only runs in the given state are returned. If omitted, then runs which have not yet finished (neither ‘DONE’ nor ‘CANCELED’) are returned. Valid states are ‘OPEN’, ‘LOCKED’, ‘DONE’, and ‘CANCELED’.
Returns: A list of scheduled Lightly Worker runs.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_scheduled_compute_worker_runs(state="OPEN")
[{'config_id': '646f34608a5613b57d8b73cc',
 'created_at': 1685009508254,
 'dataset_id': '6470a36361e9ce68180a6531',
 'id': '646f338a8a5613b57d8b73a1',
 'last_modified_at': 1685009542667,
 'owner': '643d050b8bcb91967ded65df',
 'priority': 'MID',
 'runs_on': ['worker-label'],
 'state': 'OPEN'}]

get_shared_users(dataset_id: str) → List[str]

Fetches a list of users that have access to the dataset.

Parameters: dataset_id – Dataset ID.
Returns: List of email addresses of users that have write access to the dataset.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_shared_users(dataset_id="MY_DATASET_ID")
>>> ["user@something.com"]

get_tag_by_id(tag_id: str) → TagData

Gets a tag from the current dataset by tag ID.

Parameters: tag_id – ID of the requested tag.
Returns: Tag data for the requested tag.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_tag_by_id("646b40d6c06aae1b91294a9e")
{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}

get_tag_by_name(tag_name: str) → TagData

Gets a tag from the current dataset by tag name.

Parameters: tag_name – Name of the requested tag.
Returns: Tag data for the requested tag.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_tag_by_name("cool-tag")
{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}

list_datasource_permissions() → Dict[str, Union[bool, Dict[str, str]]]

Lists granted access permissions for the datasource set up with a dataset.

Returns a string dictionary, with each permission mapped to a boolean value, see the example below. An additional errors key is present if any permission errors have been encountered. Permission errors are stored in a dictionary where permission names are keys and error messages are values.

>>> from lightly.api import ApiWorkflowClient
>>> client = ApiWorkflowClient(
...    token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID"
... )
>>> client.list_datasource_permissions()
{
    'can_read': True,
    'can_write': True,
    'can_list': False,
    'can_overwrite': True,
    'errors': {'can_list': 'error message'}
}

register_compute_worker(name: str = 'Default', labels: Optional[List[str]] = None) → str

Registers a new Lightly Worker.

The ID of the registered worker will be returned. If a worker with the same name already exists, the ID of the existing worker is returned.

Parameters

name – The name of the Lightly Worker.
labels – The labels of the Lightly Worker. See our docs for more information regarding the labels parameter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers

Returns

ID of the registered Lightly Worker.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_id = client.register_compute_worker(name="my-worker", labels=["worker-label"])
>>> worker_id
'64709eac61e9ce68180a6529'

schedule_compute_worker_run(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, selection_config: Optional[Union[Dict[str, Any], SelectionConfigV4]] = None, priority: str = DockerRunScheduledPriority.MID, runs_on: Optional[List[str]] = None) → str

Schedules a run with the given configurations.

See our docs for more information regarding the different configurations: https://docs.lightly.ai/docs/all-configuration-options

Parameters

worker_config – Lightly Worker configuration.
lightly_config – Lightly configuration.
selection_config – Selection configuration.
runs_on – The required labels the Lightly Worker must have to take the job. See our docs for more information regarding the runs_on paramter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers

Returns

The id of the scheduled run.

Raises

ApiException – If the API call returns a status code other than 200. 400: Missing or invalid parameters 402: Insufficient plan 403: Not authorized for this resource or invalid token 404: Resource (dataset or config) not found 422: Missing or invalid file in datasource
InvalidConfigError – If one of the configurations is invalid.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> selection_config = {...}
>>> worker_labels = ["worker-label"]
>>> run_id = client.schedule_compute_worker_run(
...     selection_config=selection_config, runs_on=worker_labels
... )

set_azure_config(container_name: str, account_name: str, sas_token: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the Azure configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with Azure: https://docs.lightly.ai/docs/azure

Parameters

container_name – Container name of the dataset, for example: “my-container/path/to/my/data”.
account_name – Azure account name.
sas_token – Secure Access Signature token.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_dataset_id_by_name(dataset_name: str, shared: Optional[bool] = False) → None

Sets the dataset ID in the API client given the name of the desired dataset.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are also checked. If multiple datasets with the given name are found, the API client uses the ID of the first dataset and prints a warning message.

Parameters

dataset_name – The name of the target dataset.
shared –
- If False (default), checks only datasets owned by the user.
- If True, returns datasets which have been shared with the user,
including team datasets. Excludes user’s own datasets. There can be multiple candidate datasets. * If None, returns all datasets the users has access to. There can be multiple candidate datasets.

Raises

ValueError – If no dataset with the given name exists.

Examples

>>> # A new session. Dataset "old-dataset" was created before.
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.set_dataset_id_by_name("old-dataset")

set_gcs_config(resource_path: str, project_id: str, credentials: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the Google Cloud Storage configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with Google Cloud Storage: https://docs.lightly.ai/docs/google-cloud-storage

Parameters

resource_path – GCS url of your dataset, for example: “gs://my_bucket/path/to/my/data”
project_id – GCS project id.
credentials – Content of the credentials JSON file stringified which you download from Google Cloud Platform.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_local_config(relative_path: str = '', web_server_location: Optional[str] = 'http://localhost:3456', thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the local configuration for the datasource of the current dataset.

Find a detailed explanation on how to setup Lightly with a local file server in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_local_server.html

Parameters

relative_path – Relative path from the mount root, for example: “path/to/my/data”.
web_server_location – Location of your local file server. Defaults to “http://localhost:3456”.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_obs_config(resource_path: str, obs_endpoint: str, obs_access_key_id: str, obs_secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the Telekom OBS configuration for the datasource of the current dataset.

Parameters

resource_path – OBS url of your dataset. For example, “obs://my_bucket/path/to/my/data”.
obs_endpoint – OBS endpoint.
obs_access_key_id – OBS access key id.
obs_secret_access_key – OBS secret access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_s3_config(resource_path: str, region: str, access_key: str, secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the S3 configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with AWS S3: https://docs.lightly.ai/docs/aws-s3

Parameters

resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
access_key – S3 access key.
secret_access_key – Secret for the S3 access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_s3_delegated_access_config(resource_path: str, region: str, role_arn: str, external_id: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) → None

Sets the S3 configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with AWS S3 and delegated access: https://docs.lightly.ai/docs/aws-s3#delegated-access

Parameters

resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
role_arn – Unique ARN identifier of the role.
external_id – External ID of the role.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

share_dataset_only_with(dataset_id: str, user_emails: List[str]) → None

Shares a dataset with a list of users.

This method overwrites the list of users that have had access to the dataset before. If you want to add someone new to the list, make sure you first fetch the list of users with access and include them in the user_emails parameter.

Parameters

dataset_id – ID of the dataset to be shared.
user_emails – List of email addresses of users who will get access to the dataset.

Examples

>>> # share a dataset with a user
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=["user@something.com"])
>>>
>>> # share dataset with a user while keep sharing it with previous users
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> user_emails = client.get_shared_users(dataset_id="MY_DATASET_ID")
>>> user_emails.append("additional_user2@something.com")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=user_emails)
>>>
>>> # revoke access to all users
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=[])

update_processed_until_timestamp(timestamp: int) → None

Sets the timestamp until which samples have been processed.

Parameters: timestamp – Unix timestamp of last processed sample.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset.
>>> # All samples are processed at this moment.
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_new_raw_samples()
[]
>>>
>>> # Set timestamp to an earlier moment to reprocess samples
>>> client.update_processed_until_timestamp(1684749813)
>>> client.download_new_raw_samples()
[('image-3.png', 'https://......'), ('image-4.png', 'https://......')]

verify_custom_metadata_format(custom_metadata: Dict) → None

Verifies that the custom metadata is in the correct format.

Parameters: custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.
Raises: KeyError – If “images” or “metadata” aren’t a key of custom_metadata.

class lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo(state: Union[DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING], message: str)

Information about a Lightly Worker run.

state

The state of the Lightly Worker run.

Type: Union[lightly.openapi_generated.swagger_client.models.docker_run_state.DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING]

message

The last message of the Lightly Worker run.

Type: str

ended_successfully() → bool

Checkes whether the Lightly Worker run ended successfully or failed.

Returns: A boolean value indicating if the Lightly Worker run was successful. True if the run was successful.
Raises: ValueError – If the Lightly Worker run is still in progress.

in_end_state() → bool: Checks whether the Lightly Worker run has ended.

class lightly.api.api_workflow_compute_worker.InvalidConfigurationError