lightly.api
The lightly.api module provides access to the Lightly web-app.
.api_workflow_client
- class lightly.api.api_workflow_client.ApiWorkflowClient(token: Optional[str] = None, dataset_id: Optional[str] = None, embedding_id: Optional[str] = None, creator: str = 'USER_PIP')
Provides a uniform interface to communicate with the Lightly API.
The APIWorkflowClient is used to communicate with the Lightly API. The client can run also more complex workflows which include multiple API calls at once.
The client can be used in combination with the active learning agent.
- Parameters
token – The token of the user. For further information on how to get a token, see: https://docs.lightly.ai/docs/install-lightly#api-token
dataset_id – The id of the dataset. If it is not set, but used by a workflow, the last modfied dataset is taken by default.
embedding_id – The id of the embedding to use. If it is not set, but used by a workflow, the newest embedding is taken by default
creator – Creator passed to API requests.
- append_embeddings(path_to_embeddings_csv: str, embedding_id: str)
Concatenates the embeddings from the server to the local ones.
Loads the embedding csv file belonging to the embedding_id, and appends all of its rows to the local embeddings file located at ‘path_to_embeddings_csv’.
- Parameters
path_to_embeddings_csv – The path to the csv containing the local embeddings.
embedding_id – Id of the embedding summary of the embeddings on the server.
- Raises
RuntimeError – If the number of columns in the local and the remote embeddings file mismatch.
- compute_worker_run_info_generator(scheduled_run_id: str) Iterator[lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo]
Yields information about a compute worker run
Polls the compute worker status every 30s. If the status changed, it will yield a new ComputeWorkerRunInfo. If the compute worker run finished, the generator stops.
- Parameters
scheduled_run_id – The id with which the run was scheduled.
- Returns
Generator of information about the compute worker run status.
Examples
>>> # Scheduled a compute worker run and monitor its state >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id): >>> print(f"Compute worker run is now in state='{run_info.state}' with message='{run_info.message}'") >>>
- create_compute_worker_config(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, selection_config: Optional[Union[Dict[str, Any], lightly.openapi_generated.swagger_client.models.selection_config.SelectionConfig]] = None) str
Creates a new configuration for a compute worker run.
See our docs for more information regarding the different configurations: https://docs.lightly.ai/docs/all-configuration-options
- Parameters
worker_config – Compute worker configuration.
lightly_config – Lightly configuration.
selection_config – Selection configuration.
- Returns
The id of the created config.
- create_custom_metadata_config(name: str, configs: List[lightly.openapi_generated.swagger_client.models.configuration_entry.ConfigurationEntry])
Creates custom metadata config from a list of configurations.
- Parameters
name – The name of the custom metadata configuration.
configs – List of configuration entries each specifying.
- Returns
The API response.
Examples
>>> from lightly.openapi_generated.swagger_codegen.models.configuration_entry import ConfigurationEntry >>> entry = ConfigurationEntry( >>> name='Weather', >>> path='weather', >>> default_value='unknown', >>> value_data_type='CATEGORICAL_STRING', >>> ) >>> >>> client.create_custom_metadata_config( >>> 'My Custom Metadata', >>> [entry], >>> )
- create_dataset(dataset_name: str, dataset_type: str = 'Images')
Creates a dataset on the Lightly Platform.
The dataset_id of the created dataset is stored in the client.dataset_id attribute and all further requests with the client will use the created dataset by default.
- Parameters
dataset_name – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.
- Raises
ValueError – If a dataset with dataset_name already exists.
Examples
>>> from lightly.api import ApiWorkflowClient >>> from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType >>> >>> client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN") >>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.IMAGES) >>> >>> # or to work with videos >>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.VIDEOS) >>> >>> # retrieving dataset_id of the created dataset >>> dataset_id = client.dataset_id >>> >>> # future client requests use the created dataset by default >>> client.dataset_type 'Videos'
- create_new_dataset_with_unique_name(dataset_basename: str, dataset_type: str = 'Images')
Creates a new dataset on the Lightly Platform.
If a dataset with the specified name already exists, a counter is added to the name to be able to still create it.
- Parameters
dataset_basename – The name of the dataset to be created.
dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.
- create_or_update_prediction(sample_id: str, prediction_singletons: Sequence[lightly.api.prediction_singletons.PredictionSingletonRepr], prediction_version_id: int = - 1) None
Creates or updates the predictions for one specific sample
- Parameters
sample_id – The id of the sample
prediction_version_id – And id to distinguish different predictions for the same sample.
prediction_singletons – The predictions to upload for that sample
- create_or_update_prediction_task_schema(schema: lightly.openapi_generated.swagger_client.models.prediction_task_schema.PredictionTaskSchema, prediction_version_id: int = - 1) None
Creates or updates the prediction task schema
- Parameters
schema – The prediction task schema.
prediction_version_id – A numerical id (e.g timestamp) to distinguish different predictions of different model versions. Use the same id if you don’t require versioning or if you wish to overwrite the previous schema.
Example
>>> import time >>> from lightly.api import ApiWorkflowClient >>> from lightly.openapi_generated.swagger_client import ( >>> PredictionTaskSchema, >>> TaskType, >>> PredictionTaskSchemaCategory, >>> ) >>> >>> client = ApiWorkflowClient( >>> token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" >>> ) >>> >>> schema = PredictionTaskSchema( >>> name="my-object-detection", >>> type=TaskType.OBJECT_DETECTION, >>> categories=[ >>> PredictionTaskSchemaCategory(id=0, name="dog"), >>> PredictionTaskSchemaCategory(id=1, name="cat"), >>> ], >>> ) >>> client.create_or_update_prediction_task_schema(schema=schema)
- create_or_update_predictions(sample_id_to_prediction_singletons: Mapping[str, Sequence[lightly.api.prediction_singletons.PredictionSingletonRepr]], prediction_version_id: int = - 1, progress_bar: Optional[tqdm.std.tqdm] = None, max_workers: int = 8) None
Creates or updates the predictions for specific samples
- Parameters
sample_id_to_prediction_singletons – A mapping from the sample_id of the sample to its corresponding prediction singletons. The singletons can be from different tasks and different types.
prediction_version_id – A numerical id (e.g timestamp) to distinguish different predictions of different model versions. Use the same id if you don’t require versioning or if you wish to overwrite the previous schema. This id must match the id of a prediction task schema.
progress_bar – Tqdm progress bar to show how many prediction files have already been uploaded.
max_workers – Maximum number of workers uploading predictions in parallel.
Example
>>> import time >>> from tqdm import tqdm >>> from lightly.api import ApiWorkflowClient >>> from lightly.openapi_generated.swagger_client import ( >>> PredictionTaskSchema, >>> TaskType, >>> PredictionTaskSchemaCategory, >>> ) >>> from lightly.api.prediction_singletons import PredictionSingletonClassificationRepr >>> >>> client = ApiWorkflowClient( >>> token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" >>> ) >>> >>> samples = client._samples_api.get_samples_partial_by_dataset_id(dataset_id=client.dataset_id, mode=SamplePartialMode.FILENAMES) >>> sample_id_to_prediction_singletons_dummy = { >>> sample.id: [PredictionSingletonClassificationRepr(taskName="my-task", categoryId=i%4, score=0.9, probabilities=[0.1, 0.2, 0.3, 0.4])] >>> for i, sample in enumerate(samples) >>> } >>> client.create_or_update_predictions( >>> sample_id_to_prediction_singletons=sample_id_to_prediction_singletons_dummy, >>> progress_bar=tqdm(desc="Uploading predictions", total=len(samples), unit=" predictions") >>> )
- create_tag_from_filenames(fnames_new_tag: List[str], new_tag_name: str, parent_tag_id: Optional[str] = None) lightly.openapi_generated.swagger_client.models.tag_data.TagData
Creates a new tag from a list of filenames.
- Parameters
fnames_new_tag – A list of filenames to be included in the new tag.
new_tag_name – The name of the new tag.
parent_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.
- Returns
The newly created tag.
- Raises
RuntimeError –
- dataset_exists(dataset_id: str) bool
Returns True if a dataset with dataset_id exists.
- property dataset_id: str
The current dataset_id.
If the dataset_id is set, it is returned. If it is not set, then the dataset_id of the last modified dataset is selected.
- dataset_name_exists(dataset_name: str, shared: Optional[bool] = False) bool
Returns True if a dataset with dataset_name exists and False otherwise.
- Parameters
dataset_name – Name of the dataset.
shared – If False, considers only datasets owned by the user. If True, considers only datasets which have been shared with the user. If None, considers all datasets the users has access to.
- property dataset_type: str
Returns the dataset type of the current dataset.
- delete_compute_worker(worker_id: str)
Removes a compute worker.
- Parameters
worker_id – The id of the worker to remove.
- delete_dataset_by_id(dataset_id: str)
Deletes a dataset on the Lightly Platform.
- Parameters
dataset_id – The id of the dataset to be deleted.
- delete_tag_by_id(tag_id: str) None
Deletes a tag from the current dataset on the Lightly Platform.
- Parameters
tag_id – The id of the tag to be deleted.
- delete_tag_by_name(tag_name: str) None
Deletes a tag from the current dataset on the Lightly Platform.
- Parameters
tag_name – The name of the tag to be deleted.
- download_compute_worker_run_artifacts(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_dir: str, timeout: int = 60) None
Downloads all artifacts from a run.
- Parameters
run – Run from which to download artifacts.
output_dir – Output directory where artifacts will be saved.
timeout – Timeout in seconds after which an artifact download is interrupted.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download artifacts >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_artifacts(run=run, output_dir="my_run/artifacts")
- download_compute_worker_run_checkpoint(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Downloads the last training checkpoint from a run.
See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints
- Parameters
run – Run from which to download the checkpoint.
output_path – Path where checkpoint will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download checkpoint >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_checkpoint(run=run, output_path="my_checkpoint.ckpt")
- download_compute_worker_run_corruptness_check_information(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the corruptness check information file from a run.
- Parameters
run – Run from which to download the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no corruptness check information artifact or the file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download corruptness check information file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_corruptness_check_information(run=run, output_path="corruptness_check_information.json") >>> >>> # print all corrupt samples and corruptions >>> with open("corruptness_check_information.json", 'r') as f: >>> corruptness_check_information = json.load(f) >>> for sample_name, error in corruptness_check_information["corrupt_samples"].items(): >>> print(f"Sample '{sample_name}' is corrupt because of the error '{error}'.")
- download_compute_worker_run_log(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the log file from a run.
- Parameters
run – Run from which to download the log file.
output_path – Path where log file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no log artifact or the log file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download log file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_log(run=run, output_path="log.txt")
- download_compute_worker_run_memory_log(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the memory consumption log file from a run.
- Parameters
run – Run from which to download the memory log file.
output_path – Path where memory log file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no memory log artifact or the memory log file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download memory log file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_memory_log(run=run, output_path="memlog.txt")
- download_compute_worker_run_report_json(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the report in json format from a run.
- Parameters
run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download checkpoint >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_report_json(run=run, output_path="report.json")
- download_compute_worker_run_report_pdf(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the report in pdf format from a run.
- Parameters
run – Run from which to download the report.
output_path – Path where report will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no report artifact or the report has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download report >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_report_pdf(run=run, output_path="report.pdf")
- download_compute_worker_run_sequence_information(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData, output_path: str, timeout: int = 60) None
Download the sequence information from a run.
- Parameters
run – Run from which to download the the file.
output_path – Path where the file will be saved.
timeout – Timeout in seconds after which download is interrupted.
- Raises
ArtifactNotExist – If the run has no sequence information artifact or the file has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # download sequence information file >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> client.download_compute_worker_run_sequence_information(run=run, output_path="sequence_information.json")
- download_dataset(output_dir: str, tag_name: str = 'initial-tag', max_workers: int = 8, verbose: bool = True)
Downloads images from the web-app and stores them in output_dir.
- Parameters
output_dir – Where to store the downloaded images.
tag_name – Name of the tag which should be downloaded.
max_workers – Maximum number of workers downloading images in parallel.
verbose – Whether or not to show the progress bar.
- Raises
ValueError – If the specified tag does not exist on the dataset.
RuntimeError – If the connection to the server failed.
- download_embeddings_csv(output_path: str) None
Downloads the latest embeddings from the dataset and saves them to the output path.
- Raises
RuntimeError – If no embeddings could be found for the dataset.
- download_embeddings_csv_by_id(embedding_id: str, output_path: str) None
Downloads embeddings with the given embedding id from the dataset and saves them to the output path.
- download_new_raw_samples(use_redirected_read_url: Optional[bool] = False) List[Tuple[str, str]]
Downloads filenames and read urls of unprocessed samples from the datasource.
All samples after the timestamp of ApiWorkflowClient.get_processed_until_timestamp() are fetched. After downloading the samples the timestamp is updated to the current time. This function can be repeatedly called to retrieve new samples from the datasource.
- Parameters
use_redirected_read_url – By default this is set to false unless a S3DelegatedAccess is configured in which case its always true and this param has no effect. When true this will return RedirectedReadUrls instead of ReadUrls meaning that returned URLs allow for unlimited access to the file
- Returns
A list of (filename, url) tuples, where each tuple represents a sample
- download_raw_metadata(from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None, use_redirected_read_url: Optional[bool] = False, progress_bar: Optional[tqdm.std.tqdm] = None) List[Tuple[str, str]]
Downloads all metadata filenames and read urls from the datasource between from_ and to.
Samples which have timestamp == from_ or timestamp == to will also be included.
- Parameters
from_ – Unix timestamp from which on samples are downloaded.
to – Unix timestamp up to and including which samples are downloaded.
relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.
use_redirected_read_url – By default this is set to false unless a S3DelegatedAccess is configured in which case its always true and this param has no effect. When true this will return RedirectedReadUrls instead of ReadUrls meaning that returned URLs allow for unlimited access to the file
progress_bar – Tqdm progress bar to show how many metadata files have already been retrieved.
- Returns
A list of (filename, url) tuples, where each tuple represents a sample
- download_raw_predictions(task_name: str, from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None, use_redirected_read_url: Optional[bool] = False, progress_bar: Optional[tqdm.std.tqdm] = None) List[Tuple[str, str]]
Downloads all prediction filenames and read urls from the datasource between from_ and to.
Samples which have timestamp == from_ or timestamp == to will also be included.
- Parameters
task_name – Name of the prediction task.
from_ – Unix timestamp from which on samples are downloaded.
to – Unix timestamp up to and including which samples are downloaded.
relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.
use_redirected_read_url – By default this is set to false unless a S3DelegatedAccess is configured in which case its always true and this param has no effect. When true this will return RedirectedReadUrls instead of ReadUrls meaning that returned URLs allow for unlimited access to the file
progress_bar – Tqdm progress bar to show how many prediction files have already been retrieved.
- Returns
A list of (filename, url) tuples, where each tuple represents a sample
- download_raw_samples(from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None, use_redirected_read_url: Optional[bool] = False, progress_bar: Optional[tqdm.std.tqdm] = None) List[Tuple[str, str]]
Downloads all filenames and read urls from the datasource between from_ and to.
Samples which have timestamp == from_ or timestamp == to will also be included.
- Parameters
from_ – Unix timestamp from which on samples are downloaded.
to – Unix timestamp up to and including which samples are downloaded.
relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.
use_redirected_read_url – By default this is set to false unless a S3DelegatedAccess is configured in which case its always true and this param has no effect. When true this will return RedirectedReadUrls instead of ReadUrls meaning that returned URLs allow for unlimited access to the file
progress_bar – Tqdm progress bar to show how many samples have already been retrieved.
- Returns
A list of (filename, url) tuples, where each tuple represents a sample
- export_filenames_and_read_urls_by_tag_id(tag_id: str) List[Dict]
Export the samples filenames to map with their readURL.
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of mappings of the samples filenames and readURLs within a certain tag.
- export_filenames_and_read_urls_by_tag_name(tag_name: str) List[Dict]
Export the samples filenames to map with their readURL.
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of mappings of the samples filenames and readURLs within a certain tag.
Examples
>>> # write json file which can be used to access the actual file contents. >>> mappings = client.export_filenames_and_read_urls_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-readURL-mappings.json', 'w') as f: >>> json.dump(mappings, f)
- export_filenames_by_tag_id(tag_id: str) str
Exports a list of the samples filenames within a certain tag.
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of the samples filenames within a certain tag.
- export_filenames_by_tag_name(tag_name: str) str
Exports a list of the samples filenames within a certain tag.
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of the samples filenames within a certain tag.
Examples
>>> # write json file which can be imported in Label Studio >>> filenames = client.export_filenames_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('filenames-of-initial-tag.txt', 'w') as f: >>> f.write(filenames)
- export_label_box_data_rows_by_tag_id(tag_id: str) List[Dict]
Exports samples in a format compatible with Labelbox v3.
The format is documented here: https://docs.labelbox.com/docs/images-json
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v3.
- export_label_box_data_rows_by_tag_name(tag_name: str) List[Dict]
Exports samples in a format compatible with Labelbox v3.
The format is documented here: https://docs.labelbox.com/docs/images-json
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v3.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_box_data_rows_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-labelbox-rows.json', 'w') as f: >>> json.dump(tasks, f)
- export_label_box_v4_data_rows_by_tag_id(tag_id: str) List[Dict]
Exports samples in a format compatible with Labelbox v4.
The format is documented here: https://docs.labelbox.com/docs/images-json
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox v4.
- export_label_box_v4_data_rows_by_tag_name(tag_name: str) List[Dict]
Exports samples in a format compatible with Labelbox.
The format is documented here: https://docs.labelbox.com/docs/images-json
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Labelbox.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_box_v4_data_rows_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-labelbox-rows.json', 'w') as f: >>> json.dump(tasks, f)
- export_label_studio_tasks_by_tag_id(tag_id: str) List[Dict]
Exports samples in a format compatible with Label Studio.
The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format
- Parameters
tag_id – Id of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Label Studio.
- export_label_studio_tasks_by_tag_name(tag_name: str) List[Dict]
Exports samples in a format compatible with Label Studio.
The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format
- Parameters
tag_name – Name of the tag which should exported.
- Returns
A list of dictionaries in a format compatible with Label Studio.
Examples
>>> # write json file which can be imported in Label Studio >>> tasks = client.export_label_studio_tasks_by_tag_name( >>> 'initial-tag' >>> ) >>> >>> with open('my-label-studio-tasks.json', 'w') as f: >>> json.dump(tasks, f)
- get_all_datasets() List[lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData]
Returns all datasets the user has access to.
DEPRECATED in favour of get_datasets(shared=None) and will be removed in the future.
- get_all_embedding_data() List[lightly.openapi_generated.swagger_client.models.dataset_embedding_data.DatasetEmbeddingData]
Returns embedding data of all embeddings for this dataset.
- get_all_tags() List[lightly.openapi_generated.swagger_client.models.tag_data.TagData]
Gets all tags in the Lightly Platform for current dataset id.
- Returns
A list of TagData entries for each tag on the server.
- get_compute_worker_ids() List[str]
Returns the ids of all registered compute workers.
- get_compute_worker_run(run_id: str) lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData
Returns a run given its id.
- Raises
ApiException – If no run with the given id exists.
- get_compute_worker_run_checkpoint_url(run: lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData) str
Gets the download url of the last training checkpoint from a run.
See our docs for more information regarding checkpoints: https://docs.lightly.ai/docs/train-a-self-supervised-model#checkpoints
- Parameters
run – Run from which to download the checkpoint.
- Returns
The url from which the checkpoint can be downloaded.
- Raises
ArtifactNotExist – If the run has no checkpoint artifact or the checkpoint has not yet been uploaded.
Examples
>>> # schedule run >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> >>> # wait until run completed >>> for run_info in client.compute_worker_run_info_generator(scheduled_run_id=scheduled_run_id): >>> pass >>> >>> # get checkpoint read_url >>> run = client.get_compute_worker_run_from_scheduled_run(scheduled_run_id=scheduled_run_id) >>> checkpoint_read_url = client.get_compute_worker_run_checkpoint_url(run=run)
- get_compute_worker_run_from_scheduled_run(scheduled_run_id: str) lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData
Returns a run given its scheduled run id.
- Raises
ApiException – If no run with the given scheduled run id exists or if the scheduled run has not yet started being processed by a worker.
- get_compute_worker_run_info(scheduled_run_id: str) lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo
Returns information about the compute worker run.
- Parameters
scheduled_run_id – The id with which the run was scheduled.
- Returns
Data about the compute worker run.
Examples
>>> # Scheduled a compute worker run and get its state >>> scheduled_run_id = client.schedule_compute_worker_run(...) >>> run_info = client.get_compute_worker_run_info(scheduled_run_id) >>> print(run_info)
- get_compute_worker_run_tags(run_id: str) List[lightly.openapi_generated.swagger_client.models.tag_data.TagData]
Returns all tags from a run for the current dataset.
Only returns tags for runs made with Lightly Worker version >=2.4.2.
- Parameters
run_id – Run id from which to return tags.
- Returns
List of tags created by the run. The tags are ordered by creation date from newest to oldest.
Examples
>>> # Get filenames from last run. >>> >>> from lightly.api import ApiWorkflowClient >>> client = ApiWorkflowClient( >>> token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" >>> ) >>> tags = client.get_compute_worker_run_tags(run_id="MY_LAST_RUN_ID") >>> filenames = client.export_filenames_by_tag_name(tag_name=tags[0].name)
- get_compute_worker_runs(dataset_id: Optional[str] = None) List[lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData]
Get all compute worker runs for the user.
- Parameters
dataset_id – If set, then only runs for the given dataset are returned.
- Returns
Runs sorted by creation time from old to new.
- get_custom_embedding_read_url(filename: str) str
Returns a read-url for .lightly/embeddings/{filename}.
- Parameters
filename – Filename for which to get the read-url.
Returns the read-url. If the file does not exist, a read-url is returned anyways.
- get_dataset_by_id(dataset_id: str) lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData
Returns the dataset for the given dataset id.
- get_datasets(shared: Optional[bool] = False) List[lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData]
Returns all datasets the user owns.
- Parameters
shared – If False, returns only datasets owned by the user. If True, returns only the datasets which have been shared with the user. If None, returns all datasets the user has access to (owned and shared).
- get_datasets_by_name(dataset_name: str, shared: Optional[bool] = False) List[lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData]
Returns datasets by name.
An empty list is returned if no datasets with the name exist.
- Parameters
dataset_name – Name of the dataset.
shared – If False, returns only datasets owned by the user. In this case at most one dataset will be returned. If True, returns only datasets which have been shared with the user. Can return multiple datasets. If None, returns datasets the users has access to. Can return multiple datasets.
- get_datasource() lightly.openapi_generated.swagger_client.models.datasource_config.DatasourceConfig
Calls the api to return the datasource of the current dataset.
- Returns
Datasource data of the datasource of the current dataset.
- Raises
ApiException if no datasource was configured. –
- get_embedding_by_name(name: str, ignore_suffix: bool = True) lightly.openapi_generated.swagger_client.models.dataset_embedding_data.DatasetEmbeddingData
Gets an embedding form the server by name.
- Parameters
name – The name of the embedding to get.
ignore_suffix – If true, a suffix of the embedding name on the server is ignored.
- Returns
The embedding data.
- Raises
EmbeddingDoesNotExistError – If the name does not match the name of an embedding on the server.
- get_embedding_data_by_name(name: str) lightly.openapi_generated.swagger_client.models.dataset_embedding_data.DatasetEmbeddingData
Returns embedding data with the given name for this dataset.
- Raises
ValueError – If no embedding with this name exists.
- get_filenames() List[str]
Downloads the list of filenames from the server.
This is an expensive operation, especially for large datasets.
- get_filenames_in_tag(tag_data: lightly.openapi_generated.swagger_client.models.tag_data.TagData, filenames_on_server: Optional[List[str]] = None, exclude_parent_tag: bool = False) List[str]
Gets the filenames of a tag
- Parameters
tag_data – The data of the tag.
filenames_on_server – List of all filenames on the server. If they are not given, they need to be downloaded, which is quite expensive.
exclude_parent_tag – Excludes the parent tag in the returned filenames.
- Returns
filenames_tag – The filenames of all samples in the tag.
- get_metadata_read_url(filename: str)
Returns a read-url for .lightly/metadata/{filename}.
- Parameters
filename – Filename for which to get the read-url.
Returns the read-url. If the file does not exist, a read-url is returned anyways.
- get_prediction_read_url(filename: str)
Returns a read-url for .lightly/predictions/{filename}.
- Parameters
filename – Filename for which to get the read-url.
Returns the read-url. If the file does not exist, a read-url is returned anyways.
- get_processed_until_timestamp() int
Returns the timestamp until which samples have been processed.
- Returns
Unix timestamp of last processed sample
- get_scheduled_compute_worker_runs(state: Optional[str] = None) List[lightly.openapi_generated.swagger_client.models.docker_run_scheduled_data.DockerRunScheduledData]
Returns a list of all scheduled compute worker runs for the current dataset.
- Parameters
state – DockerRunScheduledState value. If specified, then only runs in the given state are returned. If omitted, then runs which have not yet finished (neither ‘DONE’ nor ‘CANCELED’) are returned. Valid states are ‘OPEN’, ‘LOCKED’, ‘DONE’, and ‘CANCELED’.
Get list of users that have access to the dataset
- Parameters
dataset_id – Identifier of dataset
- Returns
List of email addresses of users that have write access to the dataset
Examples
>>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.get_shared_users(dataset_id="MY_DATASET_ID") >>> ["user@something.com"]
- get_tag_by_id(tag_id: str) lightly.openapi_generated.swagger_client.models.tag_data.TagData
Gets a tag from the current dataset by tag id.
- Parameters
tag_id – The id of the requested tag.
- Returns
Tag data for the requested tag.
- get_tag_by_name(tag_name: str) lightly.openapi_generated.swagger_client.models.tag_data.TagData
Gets a tag from the current dataset by tag name.
- Parameters
tag_name – The name of the requested tag.
- Returns
Tag data for the requested tag.
- index_custom_metadata_by_filename(custom_metadata: Dict) Dict[str, Optional[Dict]]
Creates an index to lookup custom metadata by filename.
- Parameters
custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.
- Returns
A dictionary mapping from filenames to custom metadata. If there are no annotations for a filename, the custom metadata is None instead.
- list_datasource_permissions() Dict[str, Optional[Union[bool, lightly.openapi_generated.swagger_client.models.datasource_config_verify_data_errors.DatasourceConfigVerifyDataErrors]]]
List granted access permissions for the datasource set up with a dataset.
Returns a string dictionary, with each permission mapped to a boolean value, see the example below. Additionally, there is the
errors
key. If there are permission errors it maps to a dictionary from permission name to the error message, otherwise the value isNone
.>>> from lightly.api import ApiWorkflowClient >>> client = ApiWorkflowClient( ... token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID" ... ) >>> client.list_datasource_permissions() {'can_list': True, 'can_overwrite': True, 'can_read': True, 'can_write': True, 'errors': None}
- register_compute_worker(name: str = 'Default', labels: Optional[List[str]] = None) str
Registers a new compute worker.
If a worker with the same name already exists, the worker id of the existing worker is returned instead of registering a new worker.
- Parameters
name – The name of the Lightly Worker.
labels – The labels of the Lightly Worker. See our docs for more information regarding the labels parameter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers
- Returns
The id of the newly registered compute worker.
- schedule_compute_worker_run(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, selection_config: Optional[Union[Dict[str, Any], lightly.openapi_generated.swagger_client.models.selection_config.SelectionConfig]] = None, priority: str = 'MID', runs_on: Optional[List[str]] = None) str
Schedules a run with the given configurations.
See our docs for more information regarding the different configurations: https://docs.lightly.ai/docs/all-configuration-options
- Parameters
worker_config – Compute worker configuration.
lightly_config – Lightly configuration.
selection_config – Selection configuration.
runs_on – The required labels the Lightly Worker must have to take the job. See our docs for more information regarding the runs_on paramter: https://docs.lightly.ai/docs/assign-scheduled-runs-to-specific-workers
- Returns
The id of the scheduled run.
- selection(selection_config: lightly.active_learning.config.selection_config.SelectionConfig, preselected_tag_id: Optional[str] = None, query_tag_id: Optional[str] = None) lightly.openapi_generated.swagger_client.models.tag_data.TagData
Performs a selection given the arguments.
- Parameters
selection_config – The configuration of the selection.
preselected_tag_id – The tag defining the already chosen samples (e.g. already labelled ones), default: None.
query_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.
- Returns
The newly created tag of the selection.
- Raises
ApiException –
ValueError –
RuntimeError –
- set_azure_config(container_name: str, account_name: str, sas_token: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None
Sets the Azure configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with Azure: https://docs.lightly.ai/docs/azure
- Parameters
container_name – Container name of the dataset, for example: “my-container/path/to/my/data”.
account_name – Azure account name.
sas_token – Secure Access Signature token.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.
- set_dataset_id_by_name(dataset_name: str, shared: Optional[bool] = False)
Sets the dataset id given the name of the dataset
- Parameters
dataset_name – The name of the dataset for which the dataset_id should be set as attribute.
shared – If False, considers only datasets owned by the user. If True, considers only the datasets which have been shared with the user. If None, consider all datasets the user has access to (owned and shared).
Raises: ValueError
- set_embedding_id_to_latest()
Sets the self.embedding_id to the one of the latest on the server.
- set_gcs_config(resource_path: str, project_id: str, credentials: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None
Sets the Google Cloud Storage configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with Google Cloud Storage: https://docs.lightly.ai/docs/google-cloud-storage
- Parameters
resource_path – GCS url of your dataset, for example: “gs://my_bucket/path/to/my/data”
project_id – GCS project id.
credentials – Content of the credentials JSON file stringified which you download from Google Cloud Platform.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.
- set_local_config(resource_path: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]') None
Sets the local configuration for the datasource of the current dataset.
Find a detailed explanation on how to setup Lightly with a local file server in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_local_server.html
- Parameters
resource_path – Url to your local file server, for example: “http://localhost:1234/path/to/my/data”.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
- set_obs_config(resource_path: str, obs_endpoint: str, obs_access_key_id: str, obs_secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None
Sets the Telekom OBS configuration for the datasource of the current dataset.
- Parameters
resource_path – OBS url of your dataset. For example, “obs://my_bucket/path/to/my/data”.
obs_endpoint – OBS endpoint.
obs_access_key_id – OBS access key id.
obs_secret_access_key – OBS secret access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.
- set_s3_config(resource_path: str, region: str, access_key: str, secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None
Sets the S3 configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with AWS S3: https://docs.lightly.ai/docs/aws-s3
- Parameters
resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
access_key – S3 access key.
secret_access_key – Secret for the S3 access key.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.
- set_s3_delegated_access_config(resource_path: str, region: str, role_arn: str, external_id: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None
Sets the S3 configuration for the datasource of the current dataset.
See our docs for a detailed explanation on how to setup Lightly with AWS S3 and delegated access: https://docs.lightly.ai/docs/aws-s3#delegated-access
- Parameters
resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.
region – S3 region where the dataset bucket is located, for example “eu-central-1”.
role_arn – Unique ARN identifier of the role.
external_id – External ID of the role.
thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.
purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.
Shares dataset with a list of users
This method overwrites the list of users that have had access to the dataset before. If you want to add someone new to the list make sure you get the list of users with access beforehand and add them as well.
- Parameters
dataset_id – Identifier of dataset
user_emails – List of email addresses of users to grant write permission
Examples
>>> # share a dataset with a user >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=["user@something.com"]) >>> >>> # share dataset with a user while keep sharing it with previous users >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> user_emails = client.get_shared_users(dataset_id="MY_DATASET_ID") >>> user_emails.append("additional_user2@something.com") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=user_emails) >>> >>> # revoke access to all users >>> client = ApiWorkflowClient(token="MY_AWESOME_TOKEN") >>> client.share_dataset_only_with(dataset_id="MY_DATASET_ID", user_emails=[])
- update_processed_until_timestamp(timestamp: int) None
Sets the timestamp until which samples have been processed.
- Parameters
timestamp – Unix timestamp of last processed sample
- upload_custom_metadata(custom_metadata: Dict, verbose: bool = False, max_workers: int = 8)
Uploads custom metadata to the Lightly platform.
The custom metadata is expected in a format similar to the COCO annotations: Under the key “images” there should be a list of dictionaries, each with a file_name and id. Under the key “metadata” the custom metadata is stored as a list of dictionaries, each with a image_id to match it to the image.
Example
>>> custom_metadata = { >>> "images": [ >>> { >>> "file_name": "image0.jpg", >>> "id": 0, >>> }, >>> { >>> "file_name": "image1.jpg", >>> "id": 1, >>> } >>> ], >>> "metadata": [ >>> { >>> "image_id": 0, >>> "number_of_people": 3, >>> "weather": { >>> "scenario": "cloudy", >>> "temperature": 20.3 >>> } >>> }, >>> { >>> "image_id": 1, >>> "number_of_people": 1, >>> "weather": { >>> "scenario": "rainy", >>> "temperature": 15.0 >>> } >>> } >>> ] >>> }
- Parameters
custom_metadata – Custom metadata as described above.
verbose – If True displays a progress bar during the upload.
max_workers – Maximum number of concurrent threads during upload.
- upload_dataset(input: Union[str, lightly.data.dataset.LightlyDataset], max_workers: int = 8, mode: str = 'thumbnails', custom_metadata: Optional[Dict] = None)
Uploads a dataset to to the Lightly cloud solution.
- Parameters
input – Either the path to the dataset, e.g. “path/to/dataset”, or the dataset in form of a LightlyDataset
max_workers – Maximum number of workers uploading images in parallel.
mode – One of [full, thumbnails, metadata]. Whether to upload thumbnails, full images, or metadata only.
custom_metadata – COCO-style dictionary of custom metadata to be uploaded.
- Raises
ValueError – If dataset is too large or input has the wrong type
RuntimeError – If the connection to the server failed.
- upload_embeddings(path_to_embeddings_csv: str, name: str)
Uploads embeddings to the server.
First checks that the specified embedding name is not on the server. If it is, the upload is aborted. Then creates a new csv with the embeddings in the order specified on the server. Next it uploads it to the server. The received embedding_id is saved as a property of self.
- Parameters
path_to_embeddings_csv – The path to the .csv containing the embeddings, e.g. “path/to/embeddings.csv”
name – The name of the embedding. If an embedding with such a name already exists on the server, the upload is aborted.
- upload_file_with_signed_url(file: io.IOBase, signed_write_url: str, headers: Optional[Dict] = None, session: Optional[requests.sessions.Session] = None) requests.models.Response
Uploads a file to a url via a put request.
- Parameters
file – The file to upload.
signed_write_url – The url to upload the file to. As no authorization is used, the url must be a signed write url.
headers – Specific headers for the request.
session – Optional requests session used to upload the file.
- Returns
The response of the put request, usually a 200 for the success case.
- verify_custom_metadata_format(custom_metadata: Dict)
Verifies that the custom metadata is in the correct format.
- Parameters
custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.
- Raises
KeyError – If “images” or “metadata” aren’t a key of custom_metadata.
- class lightly.api.api_workflow_compute_worker.ComputeWorkerRunInfo(state: Union[lightly.openapi_generated.swagger_client.models.docker_run_state.DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING], message: str)
Contains information about a compute worker run that is useful for monitoring it.
- state
The state of the compute worker run.
- Type
Union[lightly.openapi_generated.swagger_client.models.docker_run_state.DockerRunState, OPEN, CANCELED_OR_NOT_EXISTING]
- message
The last message of the compute worker run.
- Type
str
- ended_successfully() bool
Returns wether the compute worker ended successfully or failed. Raises a ValueError if the compute worker is still running.
- in_end_state() bool
Returns wether the compute worker has ended
- exception lightly.api.api_workflow_compute_worker.InvalidConfigurationError
- lightly.api.api_workflow_compute_worker.selection_config_from_dict(cfg: Dict[str, Any]) lightly.openapi_generated.swagger_client.models.selection_config.SelectionConfig
Recursively converts selection config from dict to a SelectionConfig instance.