lightly.api
The lightly.api module provides access to the Lightly API.
- class lightly.api.api_workflow_client.ApiWorkflowClient(token: Optional[str] = None, dataset_id: Optional[str] = None, embedding_id: Optional[str] = None, creator: str = Creator.USER_PIP)
Provides a uniform interface to communicate with the Lightly API.
The APIWorkflowClient is used to communicate with the Lightly API. The client can run also more complex workflows which include multiple API calls at once.
The client can be used in combination with the active learning agent.
- Parameters
token – The token of the user. If it is not passed in during initialization, the token will be read from the environment variable LIGHTLY_TOKEN. For further information on how to get a token, see: https://docs.lightly.ai/docs/install-lightly#api-token
dataset_id – The id of the dataset. If it is not set, but used by a workflow, the last modfied dataset is taken by default.
embedding_id – The id of the embedding to use. If it is not set, but used by a workflow, the newest embedding is taken by default
creator – Creator passed to API requests.
- compute_worker_run_info_generator(scheduled_run_id: str) Iterator[ComputeWorkerRunInfo]
Pulls information about a Lightly Worker run continuously.
Polls the Lightly Worker status every 30s. If the status changed, an update pops up. If the Lightly Worker run finished, the generator stops.
- Parameters
scheduled_run_id – The id with which the run was scheduled.
- Returns
Generator of information about the Lightly Worker run status.
Examples
>>> # Scheduled a Lightly Worker run and monitor its state >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id): >>> print(f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'") >>>
- create_dataset(dataset_name: str, dataset_type: str = DatasetType.IMAGES) None
Creates a dataset on the Lightly Platform.
The dataset_id of the created dataset is stored in the client.dataset_id attribute and all further requests with the client will use the created dataset by default.
- Parameters
dataset_name – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.
- Raises
ValueError – If a dataset with dataset_name already exists.
Examples
>>> from lightly.api import ApiWorkflowClient >>> from lightly.openapi_generated.swagger_client.models import DatasetType >>> >>> client = ApiWorkflowClient(token="YOUR_TOKEN") >>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.IMAGES) >>> >>> # or to work with videos >>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.VIDEOS) >>> >>> # retrieving dataset_id of the created dataset >>> dataset_id = client.dataset_id >>> >>> # future client requests use the created dataset by default >>> client.dataset_type 'Videos'
- create_new_dataset_with_unique_name(dataset_basename: str, dataset_type: str = DatasetType.IMAGES) None
Creates a new dataset on the Lightly Platform.
If a dataset with the specified name already exists, the name is suffixed by a counter value.
- Parameters
dataset_basename – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Create a dataset with a brand new name. >>> client.create_new_dataset_with_unique_name("new-dataset") >>> client.get_dataset_by_id(client.dataset_id) {'id': '6470abef4f0eb7e635c30954', 'name': 'new-dataset', ...} >>> >>> # Create another dataset with the same name. This time, the >>> # new dataset should have a suffix `_1`. >>> client.create_new_dataset_with_unique_name("new-dataset") >>> client.get_dataset_by_id(client.dataset_id) {'id': '6470ac194f0eb7e635c30990', 'name': 'new-dataset_1', ...}
- create_tag_from_filenames(fnames_new_tag: List[str], new_tag_name: str, parent_tag_id: Optional[str] = None) TagData
Creates a new tag from a list of filenames.
- Parameters
fnames_new_tag – A list of filenames to be included in the new tag.
new_tag_name – The name of the new tag.
parent_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.
- Returns
The newly created tag.
- Raises
RuntimeError – When a tag with the desired tag name already exists. When initial-tag does not exist. When any of the given files does not exist.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> filenames = ['image-1.png', 'image-2.png'] >>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag') {'id': '6470c4c1060894655c5a8ed5'}
- dataset_exists(dataset_id: str) bool
Checks if a dataset exists.
- Parameters
dataset_id – Dataset ID.
- Returns
True if the dataset exists and False otherwise.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> dataset_id = client.dataset_id >>> client.dataset_exists(dataset_id=dataset_id) True
- property dataset_id: str
The current dataset ID.
Future requests with the client will automatically use this dataset ID. If the dataset ID is set, it is returned. Otherwise, the ID of the last modified dataset is selected.
- dataset_name_exists(dataset_name: str, shared: Optional[bool] = False) bool
Checks if a dataset with the given name exists.
There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are checked.
- Parameters
dataset_name – Name of the dataset.
shared –
If False (default), checks only datasets owned by the user.
If True, checks datasets which have been shared with the user,
including team datasets. Excludes user’s own datasets. * If None, checks all datasets the users has access to.
- Returns
A boolean value indicating whether any dataset with the given name exists.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> client.dataset_name_exists(dataset_name="your-dataset-name") True
- property dataset_type: str
Returns the dataset type of the current dataset.
- delete_compute_worker(worker_id: str) None
Removes a Lightly Worker.
- Parameters
worker_id – ID of the worker to be removed.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> worker_ids = client.get_compute_worker_ids() >>> worker_ids ['64709eac61e9ce68180a6529'] >>> client.delete_compute_worker(worker_id="64709eac61e9ce68180a6529") >>> client.get_compute_worker_ids() []
- delete_dataset_by_id(dataset_id: str) None
Deletes a dataset on the Lightly Platform.
- Parameters
dataset_id – The ID of the dataset to be deleted.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> dataset_id = client.dataset_id >>> client.dataset_exists(dataset_id=dataset_id) True >>> >>> # Delete the dataset >>> client.delete_dataset_by_id(dataset_id=dataset_id) >>> client.dataset_exists(dataset_id=dataset_id) False
- delete_tag_by_id(tag_id: str) None
Deletes a tag from the current dataset.
- Parameters
tag_id – The id of the tag to be deleted.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> filenames = ['image-1.png', 'image-2.png'] >>> tag_id = client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag')["id"] >>> client.delete_tag_by_id(tag_id=tag_id)
- delete_tag_by_name(tag_name: str) None
Deletes a tag from the current dataset.
- Parameters
tag_name – The name of the tag to be deleted.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> filenames = ['image-1.png', 'image-2.png'] >>> client.create_tag_from_filenames(fnames_new_tag=filenames, new_tag_name='new-tag') >>> client.delete_tag_by_name(tag_name="new-tag")
- download_compute_worker_run_artifacts(run: DockerRunData, output_dir: str, timeout: int = 60) None
Downloads all artifacts from a run.
- Parameters
run – Run from which to download artifacts.
output_dir – Output directory where artifacts will be saved.
timeout – Timeout in seconds after which an artifact download is interrupted.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download artifacts >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_artifacts(run=run, output_dir="my_run/artifacts")
- download_compute_worker_run_checkpoint(run: DockerRunData, output_path: str, timeout: int = 60) None
Downloads the last training checkpoint from a run.
See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints
- Parameters
run – Run from which to download the checkpoint.
output_path – Path where checkpoint will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download checkpoint >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_checkpoint(run=run, output_path="my_checkpoint.ckpt")
- download_compute_worker_run_corruptness_check_information(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the corruptness check information file from a run.
- Parameters
run – Run from which to download the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no corruptness check information artifact or the file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download corruptness check information file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_corruptness_check_information(run=run, output_path="corruptness_check_information.json") >>> >>> # print all corrupt samples and corruptions >>> with open("corruptness_check_information.json", 'r') as f: >>> corruptness_check_information = json.load(f) >>> for sample_name, error in corruptness_check_information["corrupt_samples"].items(): >>> print(f"Sample '{sample_name}' is corrupt because of the error '{error}'.")
- download_compute_worker_run_log(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the log file from a run.
- Parameters
run – Run from which to download the log file.
output_path – Path where log file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no log artifact or the log file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download log file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_log(run=run, output_path="log.txt")
- download_compute_worker_run_memory_log(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the memory consumption log file from a run.
- Parameters
run – Run from which to download the memory log file.
output_path – Path where memory log file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no memory log artifact or the memory log file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download memory log file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_memory_log(run=run, output_path="memlog.txt")
- download_compute_worker_run_report_json(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the report in json format from a run.
DEPRECATED: This method is deprecated and will be removed in the future. Use download_compute_worker_run_report_v2_json to download the new report_v2.json instead.
- Parameters
run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download checkpoint >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_report_json(run=run, output_path="report.json")
- download_compute_worker_run_report_pdf(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the report in pdf format from a run.
- Parameters
run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download report >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_report_pdf(run=run, output_path="report.pdf")
- download_compute_worker_run_report_v2_json(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the report in json format from a run.
- Parameters
run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download checkpoint >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_report_v2_json(run=run, output_path="report_v2.json")
- download_compute_worker_run_sequence_information(run: DockerRunData, output_path: str, timeout: int = 60) None
Download the sequence information from a run.
- Parameters
run – Run from which to download the the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no sequence information artifact or the file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download sequence information file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_sequence_information(run=run, output_path="sequence_information.json")
- download_dataset(output_dir: str, tag_name: str = 'initial-tag', max_workers: int = 8, verbose: bool = True) None
Downloads images from the web-app and stores them in output_dir.
- Parameters
output_dir – Where to store the downloaded images.
tag_name – Name of the tag which should be downloaded.
max_workers – Maximum number of workers downloading images in parallel.
verbose – Whether or not to show the progress bar.
- Raises
ValueError – If the specified tag does not exist on the dataset.
RuntimeError – If the connection to the server failed.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.download_dataset("/tmp/data") Downloading 3 images (with 3 workers): 100%|██████████████████████████████████| 3/3 [00:01<00:00, 1.99imgs/s]
- download_embeddings_csv(output_path: str) None
Downloads the latest embeddings from the dataset.
- Parameters
output_path – Where the downloaded embedding data should be stored.
- Raises
RuntimeError – If no embeddings could be found for the dataset.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.download_embeddings_csv(output_path="/tmp/embeddings.csv") >>> >>> # File content: >>> # filenames,embedding_0,embedding_1,embedding_...,labels >>> # image-1.png,0.2124302,-0.26934767,...,0
- download_embeddings_csv_by_id(embedding_id: str, output_path: str) None
Downloads embeddings with the given embedding id from the dataset.
- Parameters
embedding_id – ID of the embedding data to be downloaded.
output_path – Where the downloaded embedding data should be stored.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.download_embeddings_csv_by_id( ... embedding_id="646f346004d77b4e1424e67e", ... output_path="/tmp/embeddings.csv" ... ) >>> >>> # File content: >>> # filenames,embedding_0,embedding_1,embedding_...,labels >>> # image-1.png,0.2124302,-0.26934767,...,0
- download_new_raw_samples(use_redirected_read_url: bool = False) List[Tuple[str, str]]
Downloads filenames and read urls of unprocessed samples from the datasource.
All samples after the timestamp of ApiWorkflowClient.get_processed_until_timestamp() are fetched. After downloading the samples, the timestamp is updated to the current time. This function can be repeatedly called to retrieve new samples from the datasource.
- Parameters
use_redirected_read_url – Flag for redirected read urls. When this flag is true, RedirectedReadUrls are returned instead of ReadUrls, meaning that the returned URLs have unlimited access to the file. Defaults to False. When S3DelegatedAccess is configured, this flag has no effect because RedirectedReadUrls are always returned.
- Returns
A list of (filename, url) tuples where each tuple represents a sample.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.download_new_raw_samples() [('image-3.png', 'https://......'), ('image-4.png', 'https://......')]
- export_filenames_and_read_urls_by_tag_id(tag_id: str) List[Dict[str, str]]
Fetches filenames, read URLs, and datasource URLs from the given tag.
More information: https://docs.lightly.ai/docs/filenames-and-readurls
- Parameters
tag_id – ID of the tag which should exported.
- Returns
A list of dictionaries with the keys “filename”, “readUrl” and “datasourceUrl”. An example: [
- {
“fileName”: “sample1.jpg”, “readUrl”: “s3://my_datasource/sample1.jpg?read_url_key=EAIFUIENDLFN”, “datasourceUrl”: “s3://my_datasource/sample1.jpg”,
}, {
”fileName”: “sample2.jpg”, “readUrl”: “s3://my_datasource/sample2.jpg?read_url_key=JSBFIEUHVSJ”, “datasourceUrl”: “s3://my_datasource/sample2.jpg”,
},
]
- export_filenames_and_read_urls_by_tag_name(tag_name: str) List[Dict[str, str]]
Fetches filenames, read URLs, and datasource URLs from the given tag name.
More information: https://docs.lightly.ai/docs/filenames-and-readurls
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries with keys “filename”, “readUrl” and “datasourceUrl”.
Examples
>>> # write json file which can be used to access the actual file contents. >>> mappings = client.export_filenames_and_read_urls_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-samples.json', 'w') as f: >>> json.dump(mappings, f)
- export_filenames_by_tag_id(tag_id: str) str
Fetches samples filenames within a certain tag by tag ID.
More information: https://docs.lightly.ai/docs/filenames-and-readurls
- Args:
- tag_id:
ID of the tag which should exported.
- Returns:
A list of filenames of samples within a certain tag.
- Examples:
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.export_filenames_by_tag_id("646b40d6c06aae1b91294a9e") 'image-1.jpg
image-2.jpg image-3.jpg’
- export_filenames_by_tag_name(tag_name: str) str
Fetches samples filenames within a certain tag by tag name.
More information: https://docs.lightly.ai/docs/filenames-and-readurls
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of filenames of samples within a certain tag.
Examples
>>> # write json file which can be imported in Label Studio >>> filenames = client.export_filenames_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('filenames-of-initial-tag.txt', 'w') as f: >>> f.write(filenames)
- export_label_box_data_rows_by_tag_id(tag_id: str) List[Dict]
Fetches samples in a format compatible with Labelbox v3.
The format is documented here: https://docs.labelbox.com/docs/images-json
More information: https://docs.lightly.ai/docs/labelbox
- Parameters
tag_id – ID of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v3.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.export_label_box_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc") [{'externalId': '2218961434_7916358f53_z.jpg', 'imageUrl': ...}]
- export_label_box_data_rows_by_tag_name(tag_name: str) List[Dict]
Fetches samples in a format compatible with Labelbox v3.
The format is documented here: https://docs.labelbox.com/docs/images-json
More information: https://docs.lightly.ai/docs/labelbox
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v3.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_box_data_rows_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-labelbox-rows.json', 'w') as f: >>> json.dump(tasks, f)
- export_label_box_v4_data_rows_by_tag_id(tag_id: str) List[Dict]
Fetches samples in a format compatible with Labelbox v4.
The format is documented here: https://docs.labelbox.com/docs/images-json
More information: https://docs.lightly.ai/docs/labelbox
- Parameters
tag_id – ID of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v4.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.export_label_box_v4_data_rows_by_tag_id(tag_id="646f34608a5613b57d8b73cc") [{'row_data': '...', 'global_key': 'image-1.jpg', 'media_type': 'IMAGE'}
- export_label_box_v4_data_rows_by_tag_name(tag_name: str) List[Dict]
Fetches samples in a format compatible with Labelbox.
The format is documented here: https://docs.labelbox.com/docs/images-json
More information: https://docs.lightly.ai/docs/labelbox
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_box_v4_data_rows_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-labelbox-rows.json', 'w') as f: >>> json.dump(tasks, f)
- export_label_studio_tasks_by_tag_id(tag_id: str) List[Dict]
Exports samples in a format compatible with Label Studio.
The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Label Studio.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.export_label_studio_tasks_by_tag_id(tag_id="646f34608a5613b57d8b73cc") [{'id': 0, 'data': {'image': '...', ...}}]
- export_label_studio_tasks_by_tag_name(tag_name: str) List[Dict]
Fetches samples in a format compatible with Label Studio.
The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format
More information: https://docs.lightly.ai/docs/labelstudio-integration
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Label Studio.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_studio_tasks_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-label-studio-tasks.json', 'w') as f: >>> json.dump(tasks, f)
- get_all_datasets() List[DatasetData]
Returns all datasets the user has access to.
DEPRECATED in favour of get_datasets(shared=None) and will be removed in the future.
- get_all_embedding_data() List[DatasetEmbeddingData]
Fetches embedding data of all embeddings for this dataset.
- Returns
A list of embedding data.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.get_all_embedding_data() [{'created_at': 1684750552181, 'id': '646b40d88355e2f54c6d2235', 'is2d': False, 'is_processed': True, 'name': 'default_20230522_10h15m50s'}]
- get_all_tags() List[TagData]
Gets all tags in the Lightly Platform from the current dataset.
- Returns
A list of tags.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.get_all_tags() [{'created_at': 1684750550014, 'dataset_id': '646b40a18355e2f54c6d2200', 'id': '646b40d6c06aae1b91294a9e', 'last_modified_at': 1684750550014, 'name': 'cool-tag', 'preselected_tag_id': None, ...}]
- get_compute_worker_ids() List[str]
Fetches the IDs of all registered Lightly Workers.
- Returns
A list of worker IDs.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> worker_ids = client.get_compute_worker_ids() >>> worker_ids ['64709eac61e9ce68180a6529', '64709f8f61e9ce68180a652a']
- get_compute_worker_run(run_id: str) DockerRunData
Fetches a Lightly Worker run.
- Parameters
run_id – Run ID.
- Returns
Details of the Lightly Worker run.
- Raises
ApiException – If no run with the given ID exists.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_compute_worker_run(run_id="6470a20461e9ce68180a6530") {'artifacts': [...], 'config_id': '6470a16461e9ce68180a6530', 'created_at': 1679479418110, 'dataset_id': '6470a36361e9ce68180a6531', 'docker_version': '2.6.0', ... }
- get_compute_worker_run_checkpoint_url(run: DockerRunData) str
Gets the download url of the last training checkpoint from a run.
See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints
- Parameters
run – Run from which to download the checkpoint.
- Returns
The url from which the checkpoint can be downloaded.
- Raises
ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # get checkpoint read_url >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> checkpoint_read_url = client.get_compute_worker_run_checkpoint_url(run=run)
- get_compute_worker_run_from_scheduled_run(scheduled_run_id: str) DockerRunData
Fetches a Lightly Worker run given its scheduled run ID.
- Parameters
scheduled_run_id – Scheduled run ID.
- Returns
Details of the Lightly Worker run.
- Raises
ApiException – If no run with the given scheduled run ID exists or if the scheduled run is not yet picked up by a worker.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_compute_worker_run_from_scheduled_run(scheduled_run_id="646f338a8a5613b57d8b73a1") {'artifacts': [...], 'config_id': '6470a16461e9ce68180a6530', 'created_at': 1679479418110, 'dataset_id': '6470a36361e9ce68180a6531', 'docker_version': '2.6.0', ... }
- get_compute_worker_run_info(scheduled_run_id: str) ComputeWorkerRunInfo
Returns information about the Lightly Worker run.
- Parameters
scheduled_run_id – ID of the scheduled run.
- Returns
Details of the Lightly Worker run.
Examples
>>> # Scheduled a Lightly Worker run and get its state >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> run_info = client.get_compute_worker_run_info(scheduled_run_id) >>> print(run_info)
- get_compute_worker_run_tags(run_id: str) List[TagData]
Returns all tags from a run with the current dataset.
Only returns tags for runs made with Lightly Worker version >=2.4.2.
- Parameters
run_id – Run ID from which to return tags.
- Returns
List of tags created by the run. The tags are ordered by creation date from newest to oldest.
Examples
>>> # Get filenames from last run. >>> >>> from lightly.api import ApiWorkflowClient >>> client = ApiWorkflowClient( >>> token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" >>> ) >>> tags = client.get_compute_worker_run_tags(run_id="MY_LAST_RUN_ID") >>> filenames = client.export_filenames_by_tag_name(tag_name=tags[0].name)
- get_compute_worker_runs(dataset_id: Optional[str] = None) List[DockerRunData]
Fetches all Lightly Worker runs for the user.
- Parameters
dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.
- Returns
Runs sorted by creation time from the oldest to the latest.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_compute_worker_runs() [{'artifacts': [...], 'config_id': '6470a16461e9ce68180a6530', 'created_at': 1679479418110, 'dataset_id': '6470a36361e9ce68180a6531', 'docker_version': '2.6.0', ... }]
- get_compute_worker_runs_iter(dataset_id: Optional[str] = None) Iterator[DockerRunData]
Returns an iterator over all Lightly Worker runs for the user.
- Parameters
dataset_id – Target dataset ID. Optional. If set, only runs with the given dataset will be returned.
- Returns
Runs iterator.
- get_compute_workers() List[DockerWorkerRegistryEntryData]
Fetches details of all registered Lightly Workers.
- Returns
A list of Lightly Worker details.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> workers = client.get_compute_workers() >>> workers [{'created_at': 1685102336056, 'docker_version': '2.6.0', 'id': '64709eac61e9ce68180a6529', 'labels': [], ... }]
- get_dataset_by_id(dataset_id: str) DatasetData
Fetches a dataset by ID.
- Parameters
dataset_id – Dataset ID.
- Returns
The dataset with the given dataset id.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> dataset_id = client.dataset_id >>> client.get_dataset_by_id(dataset_id=dataset_id) {'created_at': 1685009504596, 'datasource_processed_until_timestamp': 1685009513, 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'], 'id': '646f34608a5613b57d8b73c9', 'img_type': 'full', 'type': 'Images', ...}
- get_datasets(shared: Optional[bool] = False) List[DatasetData]
Returns all datasets owned by the current user.
There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.
- Parameters
shared –
If False (default), returns only datasets owned by the user. In this
case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.
- Returns
A list of datasets owned by the current user.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> client.get_datasets() [{'created_at': 1685009504596, 'datasource_processed_until_timestamp': 1685009513, 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'], 'id': '646f34608a5613b57d8b73c9', 'img_type': 'full', 'type': 'Images', ...}]
- get_datasets_by_name(dataset_name: str, shared: Optional[bool] = False) List[DatasetData]
Fetches datasets by name.
There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.
- Parameters
dataset_name – Name of the target dataset.
shared –
If False (default), returns only datasets owned by the user. In this
case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.
- Returns
A list of datasets that match the name. If no datasets with the name exist, an empty list is returned.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.create_dataset("your-dataset-name", dataset_type=DatasetType.IMAGES) >>> client.get_datasets_by_name(dataset_name="your-dataset-name") [{'created_at': 1685009504596, 'datasource_processed_until_timestamp': 1685009513, 'datasources': ['646f346004d77b4e1424e67e', '646f346004d77b4e1424e695'], 'id': '646f34608a5613b57d8b73c9', 'img_type': 'full', 'type': 'Images', ...}] >>> >>> # Non-existent dataset >>> client.get_datasets_by_name(dataset_name="random-name") []
- get_datasets_iter(shared: Optional[bool] = False) Iterator[DatasetData]
Returns an iterator over all datasets owned by the current user.
There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are returned.
- Parameters
shared –
If False (default), returns only datasets owned by the user. In this
case at most one dataset will be returned. * If True, returns datasets which have been shared with the user, including team datasets. Excludes user’s own datasets. Can return multiple datasets. * If None, returns all datasets the users has access to. Can return multiple datasets.
- Returns
An iterator over datasets owned by the current user.
- get_datasource() DatasourceConfig
Returns the datasource of the current dataset.
- Returns
Datasource data of the datasource of the current dataset.
- Raises
ApiException if no datasource was configured. –
- get_embedding_by_name(name: str, ignore_suffix: bool = True) DatasetEmbeddingData
Fetches an embedding in the current dataset by name.
- Parameters
name – The name of the desired embedding.
ignore_suffix – If true, a suffix of the embedding name in the current dataset is ignored.
- Returns
The embedding data.
- Raises
EmbeddingDoesNotExistError – If the name does not match the name of an embedding on the server.
- get_embedding_data_by_name(name: str) DatasetEmbeddingData
Fetches embedding data with the given name for this dataset.
- Parameters
name – Embedding name.
- Returns
Embedding data.
- Raises
ValueError – If no embedding with this name exists.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.get_embedding_data_by_name("embedding-data") [{'created_at': 1654756552401, 'id': '646f346004d77b4e1424e67e', 'is2d': False, 'is_processed': True, 'name': 'embedding-data'}]
- get_scheduled_compute_worker_runs(state: Optional[str] = None) List[DockerRunScheduledData]
Returns a list of scheduled Lightly Worker runs with the current dataset.
- Parameters
state – DockerRunScheduledState value. If specified, then only runs in the given state are returned. If omitted, then runs which have not yet finished (neither ‘DONE’ nor ‘CANCELED’) are returned. Valid states are ‘OPEN’, ‘LOCKED’, ‘DONE’, and ‘CANCELED’.
- Returns
A list of scheduled Lightly Worker runs.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_scheduled_compute_worker_runs(state="OPEN") [{'config_id': '646f34608a5613b57d8b73cc', 'created_at': 1685009508254, 'dataset_id': '6470a36361e9ce68180a6531', 'id': '646f338a8a5613b57d8b73a1', 'last_modified_at': 1685009542667, 'owner': '643d050b8bcb91967ded65df', 'priority': 'MID', 'runs_on': ['worker-label'], 'state': 'OPEN'}]
Fetches a list of users that have access to the dataset.
- Parameters
dataset_id – Dataset ID.
- Returns
List of email addresses of users that have write access to the dataset.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_shared_users(dataset_id="MY_DATASET_ID") >>> ["user@something.com"]
- get_tag_by_id(tag_id: str) TagData
Gets a tag from the current dataset by tag ID.
- Parameters
tag_id – ID of the requested tag.
- Returns
Tag data for the requested tag.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.get_tag_by_id("646b40d6c06aae1b91294a9e") {'created_at': 1684750550014, 'dataset_id': '646b40a18355e2f54c6d2200', 'id': '646b40d6c06aae1b91294a9e', 'last_modified_at': 1684750550014, 'name': 'cool-tag', 'preselected_tag_id': None, ...}
- get_tag_by_name(tag_name: str) TagData
Gets a tag from the current dataset by tag name.
- Parameters
tag_name – Name of the requested tag.
- Returns
Tag data for the requested tag.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset >>> client.set_dataset_id_by_name("my-dataset") >>> client.get_tag_by_name("cool-tag") {'created_at': 1684750550014, 'dataset_id': '646b40a18355e2f54c6d2200', 'id': '646b40d6c06aae1b91294a9e', 'last_modified_at': 1684750550014, 'name': 'cool-tag', 'preselected_tag_id': None, ...}
- list_datasource_permissions() Dict[str, Union[bool, Dict[str, str]]]
Lists granted access permissions for the datasource set up with a dataset.
Returns a string dictionary, with each permission mapped to a boolean value, see the example below. An additional
errors
key is present if any permission errors have been encountered. Permission errors are stored in a dictionary where permission names are keys and error messages are values.>>> from lightly.api import ApiWorkflowClient >>> client = ApiWorkflowClient( ... token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" ... ) >>> client.list_datasource_permissions() { 'can_read': True, 'can_write': True, 'can_list': False, 'can_overwrite': True, 'errors': {'can_list': 'error message'} }
- register_compute_worker(name: str = 'Default', labels: Optional[List[str]] = None) str
Registers a new Lightly Worker.
The ID of the registered worker will be returned. If a worker with the same name already exists, the ID of the existing worker is returned.
- Parameters
name – The name of the Lightly Worker.
labels – The labels of the Lightly Worker. See our docs for more information regarding the labels parameter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers
- Returns
ID of the registered Lightly Worker.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> worker_id = client.register_compute_worker(name="my-worker", labels=["worker-label"]) >>> worker_id '64709eac61e9ce68180a6529'
- schedule_compute_worker_run(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, selection_config: Optional[Union[Dict[str, Any], SelectionConfigV4]] = None, priority: str = DockerRunScheduledPriority.MID, runs_on: Optional[List[str]] = None) str
Schedules a run with the given configurations.
See our docs for more information regarding the different configurations: https://docs.lightly.ai/docs/all-configuration-options
- Parameters
worker_config – Lightly Worker configuration.
lightly_config – Lightly configuration.
selection_config – Selection configuration.
runs_on – The required labels the Lightly Worker must have to take the job. See our docs for more information regarding the runs_on paramter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers
- Returns
The id of the scheduled run.
- Raises
ApiException – If the API call returns a status code other than 200. 400: Missing or invalid parameters 402: Insufficient plan 403: Not authorized for this resource or invalid token 404: Resource (dataset or config) not found 422: Missing or invalid file in datasource
InvalidConfigError – If one of the configurations is invalid.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> selection_config = {...} >>> worker_labels = ["worker-label"] >>> run_id = client.schedule_compute_worker_run( ... selection_config=selection_config, runs_on=worker_labels ... )
- set_azure_config(container_name: str, account_name: str, sas_token: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the Azure configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with Azure: https://docs.lightly.ai/docs/azure
- Parameters
container_name – Container name of the dataset, for example: “my-container/path/to/my/data”.
account_name – Azure account name.
sas_token – Secure Access Signature token.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
- set_dataset_id_by_name(dataset_name: str, shared: Optional[bool] = False) None
Sets the dataset ID in the API client given the name of the desired dataset.
There can be multiple datasets with the same name accessible to the current user. This can happen if either: * A dataset has been explicitly shared with the user * The user has access to team datasets The shared flag controls whether these datasets are also checked. If multiple datasets with the given name are found, the API client uses the ID of the first dataset and prints a warning message.
- Parameters
dataset_name – The name of the target dataset.
shared –
If False (default), checks only datasets owned by the user.
If True, returns datasets which have been shared with the user,
including team datasets. Excludes user’s own datasets. There can be multiple candidate datasets. * If None, returns all datasets the users has access to. There can be multiple candidate datasets.
- Raises
ValueError – If no dataset with the given name exists.
Examples
>>> # A new session. Dataset "old-dataset" was created before. >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.set_dataset_id_by_name("old-dataset")
- set_gcs_config(resource_path: str, project_id: str, credentials: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the Google Cloud Storage configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with Google Cloud Storage: https://docs.lightly.ai/docs/google-cloud-storage
- Parameters
resource_path – GCS url of your dataset, for example: “gs://my_bucket/path/to/my/data”
project_id – GCS project id.
credentials – Content of the credentials JSON file stringified which you download from Google Cloud Platform.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
- set_local_config(relative_path: str = '', web_server_location: Optional[str] = 'http://localhost:3456', thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the local configuration for the datasource of the current dataset.
Find a detailed explanation on how to setup Lightly with a local file server in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_local_server.html
- Parameters
relative_path – Relative path from the mount root, for example: “path/to/my/data”.
web_server_location – Location of your local file server. Defaults to “http://localhost:3456”.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
- set_obs_config(resource_path: str, obs_endpoint: str, obs_access_key_id: str, obs_secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the Telekom OBS configuration for the datasource of the current dataset.
- Parameters
resource_path – OBS url of your dataset. For example, “obs://my_bucket/path/to/my/data”.
obs_endpoint – OBS endpoint.
obs_access_key_id – OBS access key id.
obs_secret_access_key – OBS secret access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
- set_s3_config(resource_path: str, region: str, access_key: str, secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the S3 configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with AWS S3: https://docs.lightly.ai/docs/aws-s3
- Parameters
resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
access_key – S3 access key.
secret_access_key – Secret for the S3 access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
- set_s3_delegated_access_config(resource_path: str, region: str, role_arn: str, external_id: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = DatasourcePurpose.INPUT_OUTPUT) None
Sets the S3 configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with AWS S3 and delegated access: https://docs.lightly.ai/docs/aws-s3#delegated-access
- Parameters
resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
role_arn – Unique ARN identifier of the role.
external_id – External ID of the role.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT).
Shares a dataset with a list of users.
This method overwrites the list of users that have had access to the dataset before. If you want to add someone new to the list, make sure you first fetch the list of users with access and include them in the user_emails parameter.
- Parameters
dataset_id – ID of the dataset to be shared.
user_emails – List of email addresses of users who will get access to the dataset.
Examples
>>> # share a dataset with a user >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=["user@something.com"]) >>> >>> # share dataset with a user while keep sharing it with previous users >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> user_emails = client.get_shared_users(dataset_id="MY_DATASET_ID") >>> user_emails.append("additional_user2@something.com") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=user_emails) >>> >>> # revoke access to all users >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=[])
- update_processed_until_timestamp(timestamp: int) None
Sets the timestamp until which samples have been processed.
- Parameters
timestamp – Unix timestamp of last processed sample.
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> >>> # Already created some Lightly Worker runs with this dataset. >>> # All samples are processed at this moment. >>> client.set_dataset_id_by_name("my-dataset") >>> client.download_new_raw_samples() [] >>> >>> # Set timestamp to an earlier moment to reprocess samples >>> client.update_processed_until_timestamp(1684749813) >>> client.download_new_raw_samples() [('image-3.png', 'https://......'), ('image-4.png', 'https://......')]
- verify_custom_metadata_format(custom_metadata: Dict) None
Verifies that the custom metadata is in the correct format.
- Parameters
custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.
- Raises
KeyError – If “images” or “metadata” aren’t a key of custom_metadata.
- class lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo(state: Union[DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING], message: str)
Information about a Lightly Worker run.
- state
The state of the Lightly Worker run.
- Type
Union[lightly.openapi_generated.swagger_client.models.docker_run_state.DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING]
- message
The last message of the Lightly Worker run.
- Type
str
- ended_successfully() bool
Checkes whether the Lightly Worker run ended successfully or failed.
- Returns
A boolean value indicating if the Lightly Worker run was successful. True if the run was successful.
- Raises
ValueError – If the Lightly Worker run is still in progress.
- in_end_state() bool
Checks whether the Lightly Worker run has ended.
- class lightly.api.api_workflow_compute_worker.InvalidConfigurationError