lightly.api

The lightly.api module provides access to the Lightly API.

class lightly.api.api_workflow_client.ApiWorkflowClient(token: Optional[str] = None, dataset_id: Optional[str] = None, embedding_id: Optional[str] = None, creator: str = Creator.USER_PIP)

Provides a uniform interface to communicate with the Lightly API.

The APIWorkflowClient is used to communicate with the Lightly API. The client can run also more complex workflows which include multiple API calls at once.

The client can be used in combination with the active learning agent.

Parameters
  • token – The token of the user. If it is not passed in during initialization, the token will be read from the environment variable LIGHTLY_TOKEN. For further information on how to get a token, see: https://docs.lightly.ai/docs/install-lightly#api-token

  • dataset_id – The id of the dataset. If it is not set, but used by a workflow, the last modfied dataset is taken by default.

  • embedding_id – The id of the embedding to use. If it is not set, but used by a workflow, the newest embedding is taken by default

  • creator – Creator passed to API requests.

compute_worker_run_info_generator(scheduled_run_id: str) Iterator[ComputeWorkerRunInfo]

Pulls information about a Lightly Worker run continuously.

Polls the Lightly Worker status every 30s. If the status changed, an update pops up. If the Lightly Worker run finished, the generator stops.

Parameters

scheduled_run_id – The id with which the run was scheduled.

Returns

Generator of information about the Lightly Worker run status.

Examples

>>> # Scheduled a Lightly Worker run and monitor its state
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id):
>>>     print(f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'")
>>>
create_dataset(dataset_name: str, dataset_type: str = DatasetType.IMAGES) None

Creates a dataset on the Lightly Platform.

The dataset_id of the created dataset is stored in the client.dataset_id attribute and all further requests with the client will use the created dataset by default.

Parameters
  • dataset_name – The name of the dataset to be created.

  • dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

Raises

ValueError – If a dataset with dataset_name already exists.

Examples

>>> from lightly.api import ApiWorkflowClient
>>> from lightly.openapi_generated.swagger_client.models import DatasetType
>>>
>>> client = ApiWorkflowClient(token="YOUR_TOKEN")
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.IMAGES)
>>>
>>> # or to work with videos
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.VIDEOS)
>>>
>>> # retrieving dataset_id of the created dataset
>>> dataset_id = client.dataset_id
>>>
>>> # future client requests use the created dataset by default
>>> client.dataset_type
'Videos'
create_new_dataset_with_unique_name(dataset_basename: str, dataset_type: str = DatasetType.IMAGES) None

Creates a new dataset on the Lightly Platform.

If a dataset with the specified name already exists, the name is suffixed by a counter value.

Parameters
  • dataset_basename – The name of the dataset to be created.

  • dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Create a dataset with a brand new name.
>>> client.create_new_dataset_with_unique_name("new-dataset")
>>> client.get_dataset_by_id(client.dataset_id)
{'id': '6470abef4f0eb7e635c30954',
 'name': 'new-dataset',
 ...}
>>>
>>> # Create another dataset with the same name. This time, the
>>> # new dataset should have a suffix `_1`.
>>> client.create_new_dataset_with_unique_name("new-dataset")
>>> client.get_dataset_by_id(client.dataset_id)
{'id': '6470ac194f0eb7e635c30990',
 'name': 'new-dataset_1',
 ...}
create_tag_from_filenames(fnames_new_tag: List[str], new_tag_name: str, parent_tag_id: Optional[str] = None) TagData

Creates a new tag from a list of filenames.

Parameters
  • fnames_new_tag – A list of filenames to be included in the new tag.

  • new_tag_name – The name of the new tag.

  • parent_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.

Returns

The newly created tag.

Raises

RuntimeError – When a tag with the desired tag name already exists. When initial-tag does not exist. When any of the given files does not exist.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')
{'id': '6470c4c1060894655c5a8ed5'}
dataset_exists(dataset_id: str) bool

Checks if a dataset exists.

Parameters

dataset_id – Dataset ID.

Returns

True if the dataset exists and False otherwise.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.dataset_exists(dataset_id=dataset_id)
True
property dataset_id: str

The current dataset ID.

Future requests with the client will automatically use this dataset ID. If the dataset ID is set, it is returned. Otherwise, the ID of the last modified dataset is selected.

dataset_name_exists(dataset_name: str, shared: Optional[bool] = False) bool

Checks if a dataset with the given name exists.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are checked.

Parameters
  • dataset_name – Name of the dataset.

  • shared

    • If False (default), checks only datasets owned by the user.

    • If True, checks datasets which have been shared with the user,

    including team datasets. Excludes user’s own datasets. * If None, checks all datasets the users has access to.

Returns

A boolean value indicating whether any dataset with the given name exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.dataset_name_exists(dataset_name="your-dataset-name")
True
property dataset_type: str

Returns the dataset type of the current dataset.

delete_compute_worker(worker_id: str) None

Removes a Lightly Worker.

Parameters

worker_id – ID of the worker to be removed.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_ids = client.get_compute_worker_ids()
>>> worker_ids
['64709eac61e9ce68180a6529']
>>> client.delete_compute_worker(worker_id="64709eac61e9ce68180a6529")
>>> client.get_compute_worker_ids()
[]
delete_dataset_by_id(dataset_id: str) None

Deletes a dataset on the Lightly Platform.

Parameters

dataset_id – The ID of the dataset to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.dataset_exists(dataset_id=dataset_id)
True
>>>
>>> # Delete the dataset
>>> client.delete_dataset_by_id(dataset_id=dataset_id)
>>> client.dataset_exists(dataset_id=dataset_id)
False
delete_tag_by_id(tag_id: str) None

Deletes a tag from the current dataset.

Parameters

tag_id – The id of the tag to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> tag_id = client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')["id"]
>>> client.delete_tag_by_id(tag_id=tag_id)
delete_tag_by_name(tag_name: str) None

Deletes a tag from the current dataset.

Parameters

tag_name – The name of the tag to be deleted.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> filenames = ['image-1.png', 'image-2.png']
>>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')
>>> client.delete_tag_by_name(tag_name="new-tag")
download_compute_worker_run_artifacts(run: DockerRunData, output_dir: str, timeout: int = 60) None

Downloads all artifacts from a run.

Parameters
  • run – Run from which to download artifacts.

  • output_dir – Output directory where artifacts will be saved.

  • timeout – Timeout in seconds after which an artifact download is interrupted.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download artifacts
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_artifacts(run=run, output_dir="my_run/artifacts")
download_compute_worker_run_checkpoint(run: DockerRunData, output_path: str, timeout: int = 60) None

Downloads the last training checkpoint from a run.

See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints

Parameters
  • run – Run from which to download the checkpoint.

  • output_path – Path where checkpoint will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_checkpoint(run=run, output_path="my_checkpoint.ckpt")
download_compute_worker_run_corruptness_check_information(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the corruptness check information file from a run.

Parameters
  • run – Run from which to download the file.

  • output_path – Path where the file will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no corruptness check information artifact or the file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download corruptness check information file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_corruptness_check_information(run=run, output_path="corruptness_check_information.json")
>>>
>>> # print all corrupt samples and corruptions
>>> with open("corruptness_check_information.json", 'r') as f:
>>>     corruptness_check_information = json.load(f)
>>> for sample_name, error in corruptness_check_information["corrupt_samples"].items():
>>>     print(f"Sample '{sample_name}' is corrupt because of the error '{error}'.")
download_compute_worker_run_log(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the log file from a run.

Parameters
  • run – Run from which to download the log file.

  • output_path – Path where log file will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no log artifact or the log file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download log file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_log(run=run, output_path="log.txt")
download_compute_worker_run_memory_log(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the memory consumption log file from a run.

Parameters
  • run – Run from which to download the memory log file.

  • output_path – Path where memory log file will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no memory log artifact or the memory log file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download memory log file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_memory_log(run=run, output_path="memlog.txt")
download_compute_worker_run_report_json(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the report in json format from a run.

DEPRECATED: This method is deprecated and will be removed in the future. Use download_compute_worker_run_report_v2_json to download the new report_v2.json instead.

Parameters
  • run – Run from which to download the report.

  • output_path – Path where report will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_json(run=run, output_path="report.json")
download_compute_worker_run_report_pdf(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the report in pdf format from a run.

Parameters
  • run – Run from which to download the report.

  • output_path – Path where report will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download report
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_pdf(run=run, output_path="report.pdf")
download_compute_worker_run_report_v2_json(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the report in json format from a run.

Parameters
  • run – Run from which to download the report.

  • output_path – Path where report will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download checkpoint
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_report_v2_json(run=run, output_path="report_v2.json")
download_compute_worker_run_sequence_information(run: DockerRunData, output_path: str, timeout: int = 60) None

Download the sequence information from a run.

Parameters
  • run – Run from which to download the the file.

  • output_path – Path where the file will be saved.

  • timeout – Timeout in seconds after which download is interrupted.

Raises

ArtifactNotExist – If the run has no sequence information artifact or the file has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # download sequence information file
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> client.download_compute_worker_run_sequence_information(run=run, output_path="sequence_information.json")
download_dataset(output_dir: str, tag_name: str = 'initial-tag', max_workers: int = 8, verbose: bool = True) None

Downloads images from the web-app and stores them in output_dir.

Parameters
  • output_dir – Where to store the downloaded images.

  • tag_name – Name of the tag which should be downloaded.

  • max_workers – Maximum number of workers downloading images in parallel.

  • verbose – Whether or not to show the progress bar.

Raises
  • ValueError – If the specified tag does not exist on the dataset.

  • RuntimeError – If the connection to the server failed.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_dataset("/tmp/data")
Downloading 3 images (with 3 workers):
100%|██████████████████████████████████| 3/3 [00:01<00:00,  1.99imgs/s]
download_embeddings_csv(output_path: str) None

Downloads the latest embeddings from the dataset.

Parameters

output_path – Where the downloaded embedding data should be stored.

Raises

RuntimeError – If no embeddings could be found for the dataset.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_embeddings_csv(output_path="/tmp/embeddings.csv")
>>>
>>> # File content:
>>> # filenames,embedding_0,embedding_1,embedding_...,labels
>>> # image-1.png,0.2124302,-0.26934767,...,0
download_embeddings_csv_by_id(embedding_id: str, output_path: str) None

Downloads embeddings with the given embedding id from the dataset.

Parameters
  • embedding_id – ID of the embedding data to be downloaded.

  • output_path – Where the downloaded embedding data should be stored.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_embeddings_csv_by_id(
...     embedding_id="646f346004d77b4e1424e67e",
...     output_path="/tmp/embeddings.csv"
... )
>>>
>>> # File content:
>>> # filenames,embedding_0,embedding_1,embedding_...,labels
>>> # image-1.png,0.2124302,-0.26934767,...,0
download_new_raw_samples(use_redirected_read_url: bool = False) List[Tuple[str, str]]

Downloads filenames and read urls of unprocessed samples from the datasource.

All samples after the timestamp of ApiWorkflowClient.get_processed_until_timestamp() are fetched. After downloading the samples, the timestamp is updated to the current time. This function can be repeatedly called to retrieve new samples from the datasource.

Parameters

use_redirected_read_url – Flag for redirected read urls. When this flag is true, RedirectedReadUrls are returned instead of ReadUrls, meaning that the returned URLs have unlimited access to the file. Defaults to False. When S3DelegatedAccess is configured, this flag has no effect because RedirectedReadUrls are always returned.

Returns

A list of (filename, url) tuples where each tuple represents a sample.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_new_raw_samples()
[('image-3.png', 'https://......'), ('image-4.png', 'https://......')]
export_filenames_and_read_urls_by_tag_id(tag_id: str) List[Dict[str, str]]

Fetches filenames, read URLs, and datasource URLs from the given tag.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters

tag_id – ID of the tag which should exported.

Returns

A list of dictionaries with the keys “filename”, “readUrl” and “datasourceUrl”. An example: [

{

“fileName”: “sample1.jpg”, “readUrl”: “s3://my_datasource/sample1.jpg?read_url_key=EAIFUIENDLFN”, “datasourceUrl”: “s3://my_datasource/sample1.jpg”,

}, {

”fileName”: “sample2.jpg”, “readUrl”: “s3://my_datasource/sample2.jpg?read_url_key=JSBFIEUHVSJ”, “datasourceUrl”: “s3://my_datasource/sample2.jpg”,

},

]

export_filenames_and_read_urls_by_tag_name(tag_name: str) List[Dict[str, str]]

Fetches filenames, read URLs, and datasource URLs from the given tag name.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries with keys “filename”, “readUrl” and “datasourceUrl”.

Examples

>>> # write json file which can be used to access the actual file contents.
>>> mappings = client.export_filenames_and_read_urls_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-samples.json', 'w') as f:
>>>     json.dump(mappings, f)
export_filenames_by_tag_id(tag_id: str) str

Fetches samples filenames within a certain tag by tag ID.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Args:
tag_id:

ID of the tag which should exported.

Returns:

A list of filenames of samples within a certain tag.

Examples:
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_filenames_by_tag_id("646b40d6c06aae1b91294a9e")
'image-1.jpg

image-2.jpg image-3.jpg’

export_filenames_by_tag_name(tag_name: str) str

Fetches samples filenames within a certain tag by tag name.

More information: https://docs.lightly.ai/docs/filenames-and-readurls

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of filenames of samples within a certain tag.

Examples

>>> # write json file which can be imported in Label Studio
>>> filenames = client.export_filenames_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('filenames-of-initial-tag.txt', 'w') as f:
>>>     f.write(filenames)
export_label_box_data_rows_by_tag_id(tag_id: str) List[Dict]

Fetches samples in a format compatible with Labelbox v3.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters

tag_id – ID of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox v3.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_box_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'externalId': '2218961434_7916358f53_z.jpg', 'imageUrl': ...}]
export_label_box_data_rows_by_tag_name(tag_name: str) List[Dict]

Fetches samples in a format compatible with Labelbox v3.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox v3.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_box_data_rows_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-labelbox-rows.json', 'w') as f:
>>>     json.dump(tasks, f)
export_label_box_v4_data_rows_by_tag_id(tag_id: str) List[Dict]

Fetches samples in a format compatible with Labelbox v4.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters

tag_id – ID of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox v4.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_box_v4_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'row_data': '...', 'global_key': 'image-1.jpg', 'media_type': 'IMAGE'}
export_label_box_v4_data_rows_by_tag_name(tag_name: str) List[Dict]

Fetches samples in a format compatible with Labelbox.

The format is documented here: https://docs.labelbox.com/docs/images-json

More information: https://docs.lightly.ai/docs/labelbox

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_box_v4_data_rows_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-labelbox-rows.json', 'w') as f:
>>>     json.dump(tasks, f)
export_label_studio_tasks_by_tag_id(tag_id: str) List[Dict]

Exports samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

Parameters

tag_id – Id of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Label Studio.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.export_label_studio_tasks_by_tag_id(tag_id="646f34608a5613b57d8b73cc")
[{'id': 0, 'data': {'image': '...', ...}}]
export_label_studio_tasks_by_tag_name(tag_name: str) List[Dict]

Fetches samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

More information: https://docs.lightly.ai/docs/labelstudio-integration

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Label Studio.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_studio_tasks_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-label-studio-tasks.json', 'w') as f:
>>>     json.dump(tasks, f)
get_all_datasets() List[DatasetData]

Returns all datasets the user has access to.

DEPRECATED in favour of get_datasets(shared=None) and will be removed in the future.

get_all_embedding_data() List[DatasetEmbeddingData]

Fetches embedding data of all embeddings for this dataset.

Returns

A list of embedding data.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_all_embedding_data()
[{'created_at': 1684750552181,
 'id': '646b40d88355e2f54c6d2235',
 'is2d': False,
 'is_processed': True,
 'name': 'default_20230522_10h15m50s'}]
get_all_tags() List[TagData]

Gets all tags in the Lightly Platform from the current dataset.

Returns

A list of tags.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_all_tags()
[{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}]
get_compute_worker_ids() List[str]

Fetches the IDs of all registered Lightly Workers.

Returns

A list of worker IDs.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_ids = client.get_compute_worker_ids()
>>> worker_ids
['64709eac61e9ce68180a6529', '64709f8f61e9ce68180a652a']
get_compute_worker_run(run_id: str) DockerRunData

Fetches a Lightly Worker run.

Parameters

run_id – Run ID.

Returns

Details of the Lightly Worker run.

Raises

ApiException – If no run with the given ID exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_run(run_id="6470a20461e9ce68180a6530")
{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
 }
get_compute_worker_run_checkpoint_url(run: DockerRunData) str

Gets the download url of the last training checkpoint from a run.

See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints

Parameters

run – Run from which to download the checkpoint.

Returns

The url from which the checkpoint can be downloaded.

Raises

ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.

Examples

>>> # schedule run
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>>
>>> # wait until run completed
>>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id):
>>>     pass
>>>
>>> # get checkpoint read_url
>>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id)
>>> checkpoint_read_url = client.get_compute_worker_run_checkpoint_url(run=run)
get_compute_worker_run_from_scheduled_run(scheduled_run_id: str) DockerRunData

Fetches a Lightly Worker run given its scheduled run ID.

Parameters

scheduled_run_id – Scheduled run ID.

Returns

Details of the Lightly Worker run.

Raises

ApiException – If no run with the given scheduled run ID exists or if the scheduled run is not yet picked up by a worker.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_run_from_scheduled_run(scheduled_run_id="646f338a8a5613b57d8b73a1")
{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
}
get_compute_worker_run_info(scheduled_run_id: str) ComputeWorkerRunInfo

Returns information about the Lightly Worker run.

Parameters

scheduled_run_id – ID of the scheduled run.

Returns

Details of the Lightly Worker run.

Examples

>>> # Scheduled a Lightly Worker run and get its state
>>> scheduled_run_id = client.schedule_compute_worker_run(...)
>>> run_info = client.get_compute_worker_run_info(scheduled_run_id)
>>> print(run_info)
get_compute_worker_run_tags(run_id: str) List[TagData]

Returns all tags from a run with the current dataset.

Only returns tags for runs made with Lightly Worker version >=2.4.2.

Parameters

run_id – Run ID from which to return tags.

Returns

List of tags created by the run. The tags are ordered by creation date from newest to oldest.

Examples

>>> # Get filenames from last run.
>>>
>>> from lightly.api import ApiWorkflowClient
>>> client = ApiWorkflowClient(
>>>     token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID"
>>> )
>>> tags = client.get_compute_worker_run_tags(run_id="MY_LAST_RUN_ID")
>>> filenames = client.export_filenames_by_tag_name(tag_name=tags[0].name)
get_compute_worker_runs(dataset_id: Optional[str] = None) List[DockerRunData]

Fetches all Lightly Worker runs for the user.

Parameters

dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.

Returns

Runs sorted by creation time from the oldest to the latest.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_compute_worker_runs()
[{'artifacts': [...],
 'config_id': '6470a16461e9ce68180a6530',
 'created_at': 1679479418110,
 'dataset_id': '6470a36361e9ce68180a6531',
 'docker_version': '2.6.0',
 ...
 }]
get_compute_worker_runs_iter(dataset_id: Optional[str] = None) Iterator[DockerRunData]

Returns an iterator over all Lightly Worker runs for the user.

Parameters

dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.

Returns

Runs iterator.

get_compute_workers() List[DockerWorkerRegistryEntryData]

Fetches details of all registered Lightly Workers.

Returns

A list of Lightly Worker details.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> workers = client.get_compute_workers()
>>> workers
[{'created_at': 1685102336056,
    'docker_version': '2.6.0',
    'id': '64709eac61e9ce68180a6529',
    'labels': [],
    ...
}]
get_dataset_by_id(dataset_id: str) DatasetData

Fetches a dataset by ID.

Parameters

dataset_id – Dataset ID.

Returns

The dataset with the given dataset id.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> dataset_id = client.dataset_id
>>> client.get_dataset_by_id(dataset_id=dataset_id)
{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}
get_datasets(shared: Optional[bool] = False) List[DatasetData]

Returns all datasets owned by the current user.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters

shared

  • If False (default), returns only datasets owned by the user. In this

case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

A list of datasets owned by the current user.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.get_datasets()
[{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}]
get_datasets_by_name(dataset_name: str, shared: Optional[bool] = False) List[DatasetData]

Fetches datasets by name.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters
  • dataset_name – Name of the target dataset.

  • shared

    • If False (default), returns only datasets owned by the user. In this

    case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

A list of datasets that match the name. If no datasets with the name exist, an empty list is returned.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES)
>>> client.get_datasets_by_name(dataset_name="your-dataset-name")
[{'created_at': 1685009504596,
 'datasource_processed_until_timestamp': 1685009513,
 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'],
 'id': '646f34608a5613b57d8b73c9',
 'img_type': 'full',
 'type': 'Images',
 ...}]
>>>
>>> # Non-existent dataset
>>> client.get_datasets_by_name(dataset_name="random-name")
[]
get_datasets_iter(shared: Optional[bool] = False) Iterator[DatasetData]

Returns an iterator over all datasets owned by the current user.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.

Parameters

shared

  • If False (default), returns only datasets owned by the user. In this

case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.

Returns

An iterator over datasets owned by the current user.

get_datasource() DatasourceConfig

Returns the datasource of the current dataset.

Returns

Datasource data of the datasource of the current dataset.

Raises

ApiException if no datasource was configured.

get_embedding_by_name(name: str, ignore_suffix: bool = True) DatasetEmbeddingData

Fetches an embedding in the current dataset by name.

Parameters
  • name – The name of the desired embedding.

  • ignore_suffix – If true, a suffix of the embedding name in the current dataset is ignored.

Returns

The embedding data.

Raises

EmbeddingDoesNotExistError – If the name does not match the name of an embedding on the server.

get_embedding_data_by_name(name: str) DatasetEmbeddingData

Fetches embedding data with the given name for this dataset.

Parameters

name – Embedding name.

Returns

Embedding data.

Raises

ValueError – If no embedding with this name exists.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_embedding_data_by_name("embedding-data")
[{'created_at': 1654756552401,
 'id': '646f346004d77b4e1424e67e',
 'is2d': False,
 'is_processed': True,
 'name': 'embedding-data'}]
get_scheduled_compute_worker_runs(state: Optional[str] = None) List[DockerRunScheduledData]

Returns a list of scheduled Lightly Worker runs with the current dataset.

Parameters

state – DockerRunScheduledState value. If specified, then only runs in the given state are returned. If omitted, then runs which have not yet finished (neither ‘DONE’ nor ‘CANCELED’) are returned. Valid states are ‘OPEN’, ‘LOCKED’, ‘DONE’, and ‘CANCELED’.

Returns

A list of scheduled Lightly Worker runs.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_scheduled_compute_worker_runs(state="OPEN")
[{'config_id': '646f34608a5613b57d8b73cc',
 'created_at': 1685009508254,
 'dataset_id': '6470a36361e9ce68180a6531',
 'id': '646f338a8a5613b57d8b73a1',
 'last_modified_at': 1685009542667,
 'owner': '643d050b8bcb91967ded65df',
 'priority': 'MID',
 'runs_on': ['worker-label'],
 'state': 'OPEN'}]
get_shared_users(dataset_id: str) List[str]

Fetches a list of users that have access to the dataset.

Parameters

dataset_id – Dataset ID.

Returns

List of email addresses of users that have write access to the dataset.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.get_shared_users(dataset_id="MY_DATASET_ID")
>>> ["user@something.com"]
get_tag_by_id(tag_id: str) TagData

Gets a tag from the current dataset by tag ID.

Parameters

tag_id – ID of the requested tag.

Returns

Tag data for the requested tag.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_tag_by_id("646b40d6c06aae1b91294a9e")
{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}
get_tag_by_name(tag_name: str) TagData

Gets a tag from the current dataset by tag name.

Parameters

tag_name – Name of the requested tag.

Returns

Tag data for the requested tag.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.get_tag_by_name("cool-tag")
{'created_at': 1684750550014,
 'dataset_id': '646b40a18355e2f54c6d2200',
 'id': '646b40d6c06aae1b91294a9e',
 'last_modified_at': 1684750550014,
 'name': 'cool-tag',
 'preselected_tag_id': None,
 ...}
list_datasource_permissions() Dict[str, Union[bool, Dict[str, str]]]

Lists granted access permissions for the datasource set up with a dataset.

Returns a string dictionary, with each permission mapped to a boolean value, see the example below. An additional errors key is present if any permission errors have been encountered. Permission errors are stored in a dictionary where permission names are keys and error messages are values.

>>> from lightly.api import ApiWorkflowClient
>>> client = ApiWorkflowClient(
...    token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID"
... )
>>> client.list_datasource_permissions()
{
    'can_read': True,
    'can_write': True,
    'can_list': False,
    'can_overwrite': True,
    'errors': {'can_list': 'error message'}
}
register_compute_worker(name: str = 'Default', labels: Optional[List[str]] = None) str

Registers a new Lightly Worker.

The ID of the registered worker will be returned. If a worker with the same name already exists, the ID of the existing worker is returned.

Parameters
Returns

ID of the registered Lightly Worker.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> worker_id = client.register_compute_worker(name="my-worker", labels=["worker-label"])
>>> worker_id
'64709eac61e9ce68180a6529'
schedule_compute_worker_run(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, selection_config: Optional[Union[Dict[str, Any], SelectionConfigV4]] = None, priority: str = DockerRunScheduledPriority.MID, runs_on: Optional[List[str]] = None) str

Schedules a run with the given configurations.

See our docs for more information regarding the different configurations: https://docs.lightly.ai/docs/all-configuration-options

Parameters
  • worker_config – Lightly Worker configuration.

  • lightly_config – Lightly configuration.

  • selection_config – Selection configuration.

  • runs_on – The required labels the Lightly Worker must have to take the job. See our docs for more information regarding the runs_on paramter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers

Returns

The id of the scheduled run.

Raises
  • ApiException – If the API call returns a status code other than 200. 400: Missing or invalid parameters 402: Insufficient plan 403: Not authorized for this resource or invalid token 404: Resource (dataset or config) not found 422: Missing or invalid file in datasource

  • InvalidConfigError – If one of the configurations is invalid.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> selection_config = {...}
>>> worker_labels = ["worker-label"]
>>> run_id = client.schedule_compute_worker_run(
...     selection_config=selection_config, runs_on=worker_labels
... )
set_azure_config(container_name: str, account_name: str, sas_token: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the Azure configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with Azure: https://docs.lightly.ai/docs/azure

Parameters
  • container_name – Container name of the dataset, for example: “my-container/path/to/my/data”.

  • account_name – Azure account name.

  • sas_token – Secure Access Signature token.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_dataset_id_by_name(dataset_name: str, shared: Optional[bool] = False) None

Sets the dataset ID in the API client given the name of the desired dataset.

There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are also checked. If multiple datasets with the given name are found, the API client uses the ID of the first dataset and prints a warning message.

Parameters
  • dataset_name – The name of the target dataset.

  • shared

    • If False (default), checks only datasets owned by the user.

    • If True, returns datasets which have been shared with the user,

    including team datasets. Excludes user’s own datasets. There can be multiple candidate datasets. * If None, returns all datasets the users has access to. There can be multiple candidate datasets.

Raises

ValueError – If no dataset with the given name exists.

Examples

>>> # A new session. Dataset "old-dataset" was created before.
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.set_dataset_id_by_name("old-dataset")
set_gcs_config(resource_path: str, project_id: str, credentials: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the Google Cloud Storage configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with Google Cloud Storage: https://docs.lightly.ai/docs/google-cloud-storage

Parameters
  • resource_path – GCS url of your dataset, for example: “gs://my_bucket/path/to/my/data”

  • project_id – GCS project id.

  • credentials – Content of the credentials JSON file stringified which you download from Google Cloud Platform.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_local_config(relative_path: str = '', web_server_location: Optional[str] = 'http://localhost:3456', thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the local configuration for the datasource of the current dataset.

Find a detailed explanation on how to setup Lightly with a local file server in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_local_server.html

Parameters
  • relative_path – Relative path from the mount root, for example: “path/to/my/data”.

  • web_server_location – Location of your local file server. Defaults to “http://localhost:3456”.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_obs_config(resource_path: str, obs_endpoint: str, obs_access_key_id: str, obs_secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the Telekom OBS configuration for the datasource of the current dataset.

Parameters
  • resource_path – OBS url of your dataset. For example, “obs://my_bucket/path/to/my/data”.

  • obs_endpoint – OBS endpoint.

  • obs_access_key_id – OBS access key id.

  • obs_secret_access_key – OBS secret access key.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_s3_config(resource_path: str, region: str, access_key: str, secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the S3 configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with AWS S3: https://docs.lightly.ai/docs/aws-s3

Parameters
  • resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.

  • region – S3 region where the dataset bucket is located, for example “eu-central-1”.

  • access_key – S3 access key.

  • secret_access_key – Secret for the S3 access key.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

set_s3_delegated_access_config(resource_path: str, region: str, role_arn: str, external_id: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None

Sets the S3 configuration for the datasource of the current dataset.

See our docs for a detailed explanation on how to setup Lightly with AWS S3 and delegated access: https://docs.lightly.ai/docs/aws-s3#delegated-access

Parameters
  • resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.

  • region – S3 region where the dataset bucket is located, for example “eu-central-1”.

  • role_arn – Unique ARN identifier of the role.

  • external_id – External ID of the role.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).

share_dataset_only_with(dataset_id: str, user_emails: List[str]) None

Shares a dataset with a list of users.

This method overwrites the list of users that have had access to the dataset before. If you want to add someone new to the list, make sure you first fetch the list of users with access and include them in the user_emails parameter.

Parameters
  • dataset_id – ID of the dataset to be shared.

  • user_emails – List of email addresses of users who will get access to the dataset.

Examples

>>> # share a dataset with a user
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=["user@something.com"])
>>>
>>> # share dataset with a user while keep sharing it with previous users
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> user_emails = client.get_shared_users(dataset_id="MY_DATASET_ID")
>>> user_emails.append("additional_user2@something.com")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=user_emails)
>>>
>>> # revoke access to all users
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=[])
update_processed_until_timestamp(timestamp: int) None

Sets the timestamp until which samples have been processed.

Parameters

timestamp – Unix timestamp of last processed sample.

Examples

>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN")
>>>
>>> # Already created some Lightly Worker runs with this dataset.
>>> # All samples are processed at this moment.
>>> client.set_dataset_id_by_name("my-dataset")
>>> client.download_new_raw_samples()
[]
>>>
>>> # Set timestamp to an earlier moment to reprocess samples
>>> client.update_processed_until_timestamp(1684749813)
>>> client.download_new_raw_samples()
[('image-3.png', 'https://......'), ('image-4.png', 'https://......')]
verify_custom_metadata_format(custom_metadata: Dict) None

Verifies that the custom metadata is in the correct format.

Parameters

custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.

Raises

KeyError – If “images” or “metadata” aren’t a key of custom_metadata.

class lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo(state: Union[DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING], message: str)

Information about a Lightly Worker run.

state

The state of the Lightly Worker run.

Type

Union[lightly.openapi_generated.swagger_client.models.docker_run_state.DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING]

message

The last message of the Lightly Worker run.

Type

str

ended_successfully() bool

Checkes whether the Lightly Worker run ended successfully or failed.

Returns

A boolean value indicating if the Lightly Worker run was successful. True if the run was successful.

Raises

ValueError – If the Lightly Worker run is still in progress.

in_end_state() bool

Checks whether the Lightly Worker run has ended.

class lightly.api.api_workflow_compute_worker.InvalidConfigurationError