lightly.api

The lightly.api module provides access to the Lightly web-app.

.api_workflow_client

class lightly.api.api_workflow_client.ApiWorkflowClient(token: Optional[str] = None, dataset_id: Optional[str] = None, embedding_id: Optional[str] = None)

Provides a uniform interface to communicate with the api

The APIWorkflowClient is used to communicaate with the Lightly API. The client can run also more complex workflows which include multiple API calls at once.

The client can be used in combination with the active learning agent.

Parameters
  • token – the token of the user, provided in webapp

  • dataset_id – the id of the dataset, provided in webapp. If it is not set, but used by a workflow, the last modfied dataset is taken by default.

  • embedding_id – the id of the embedding to use. If it is not set, but used by a workflow, the newest embedding is taken by default

append_embeddings(path_to_embeddings_csv: str, embedding_id: str)

Concatenates the embeddings from the server to the local ones.

Loads the embedding csv file belonging to the embedding_id, and appends all of its rows to the local embeddings file located at ‘path_to_embeddings_csv’.

Parameters
  • path_to_embeddings_csv – The path to the csv containing the local embeddings.

  • embedding_id – Id of the embedding summary of the embeddings on the server.

Raises

RuntimeError – If the number of columns in the local and the remote embeddings file mismatch.

create_compute_worker_config(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None) str

Creates a new configuration for a compute worker run.

Parameters
Returns

The id of the created config.

create_custom_metadata_config(name: str, configs: List[lightly.openapi_generated.swagger_client.models.configuration_entry.ConfigurationEntry])

Creates custom metadata config from a list of configurations.

Parameters
  • name – The name of the custom metadata configuration.

  • configs – List of configuration entries each specifying.

Returns

The API response.

Examples

>>> from lightly.openapi_generated.swagger_codegen.models.configuration_entry import ConfigurationEntry
>>> entry = ConfigurationEntry(
>>>     name='Weather',
>>>     path='weather',
>>>     default_value='unknown',
>>>     value_data_type='CATEGORICAL_STRING',
>>> )
>>>
>>> client.create_custom_metadata_config(
>>>     'My Custom Metadata',
>>>     [entry],
>>> )
create_dataset(dataset_name: str, dataset_type: Optional[str] = None)

Creates a dataset on the Lightly Platform..

If a dataset with that name already exists, instead the dataset_id is set.

Parameters
  • dataset_name – The name of the dataset to be created.

  • dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

Examples

>>> from lightly.api import ApiWorkflowClient
>>> from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
>>>
>>> client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.IMAGES)
>>>
>>> # or to work with videos
>>> client.create_dataset('your-dataset-name', dataset_type=DatasetType.VIDEOS)
create_new_dataset_with_unique_name(dataset_basename: str, dataset_type: Optional[str] = None)

Creates a new dataset on the Lightly Platform.

If a dataset with the specified name already exists, a counter is added to the name to be able to still create it.

Parameters
  • dataset_basename – The name of the dataset to be created.

  • dataset_type – The type of the dataset. We recommend to use the API provided constants DatasetType.IMAGES and DatasetType.VIDEOS.

create_tag_from_filenames(fnames_new_tag: List[str], new_tag_name: str, parent_tag_id: Optional[str] = None) lightly.openapi_generated.swagger_client.models.tag_data.TagData

Creates a new tag from a list of filenames.

Parameters
  • fnames_new_tag – A list of filenames to be included in the new tag.

  • new_tag_name – The name of the new tag.

  • parent_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.

Returns

The newly created tag.

Raises

RuntimeError

dataset_exists(dataset_id: str)

Returns True if a dataset with dataset_id exists.

property dataset_id: str

The current dataset_id.

If the dataset_id is set, it is returned. If it is not set, then the dataset_id of the last modified dataset is selected.

property dataset_type: str

Returns the dataset type of the current dataset.

delete_compute_worker(worker_id: str)

Removes a compute worker.

Parameters

worker_id – The id of the worker to remove.

delete_dataset_by_id(dataset_id: str)

Deletes a dataset on the Lightly Platform.

Parameters

dataset_id – The id of the dataset to be deleted.

delete_tag_by_id(tag_id: str)

Deletes a tag on the web platform.

Parameters

tag_id – The id of the tag to be deleted.

download_dataset(output_dir: str, tag_name: str = 'initial-tag', verbose: bool = True)

Downloads images from the web-app and stores them in output_dir.

Parameters
  • output_dir – Where to store the downloaded images.

  • tag_name – Name of the tag which should be downloaded.

  • verbose – Whether or not to show the progress bar.

Raises
  • ValueError – If the specified tag does not exist on the dataset.

  • RuntimeError – If the connection to the server failed.

download_new_raw_samples() List[Tuple[str, str]]

Downloads filenames and read urls of unprocessed samples from the datasource.

All samples after the timestamp of ApiWorkflowClient.get_processed_until_timestamp() are fetched. After downloading the samples the timestamp is updated to the current time. This function can be repeatedly called to retrieve new samples from the datasource.

Returns

A list of (filename, url) tuples, where each tuple represents a sample

download_raw_metadata(from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None) List[Tuple[str, str]]

Downloads all metadata filenames and read urls from the datasource between from_ and to.

Samples which have timestamp == from_ or timestamp == to will also be included.

Parameters
  • from_ – Unix timestamp from which on samples are downloaded.

  • to – Unix timestamp up to and including which samples are downloaded.

  • relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.

Returns

A list of (filename, url) tuples, where each tuple represents a sample

download_raw_predictions(task_name: str, from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None) List[Tuple[str, str]]

Downloads all prediction filenames and read urls from the datasource between from_ and to.

Samples which have timestamp == from_ or timestamp == to will also be included.

Parameters
  • task_name – Name of the prediction task.

  • from_ – Unix timestamp from which on samples are downloaded.

  • to – Unix timestamp up to and including which samples are downloaded.

  • relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.

Returns

A list of (filename, url) tuples, where each tuple represents a sample

download_raw_samples(from_: int = 0, to: Optional[int] = None, relevant_filenames_file_name: Optional[str] = None) List[Tuple[str, str]]

Downloads all filenames and read urls from the datasource between from_ and to.

Samples which have timestamp == from_ or timestamp == to will also be included.

Parameters
  • from_ – Unix timestamp from which on samples are downloaded.

  • to – Unix timestamp up to and including which samples are downloaded.

  • relevant_filenames_file_name – The path to the relevant filenames text file in the cloud bucket. The path is relative to the datasource root.

Returns

A list of (filename, url) tuples, where each tuple represents a sample

export_label_box_data_rows_by_tag_id(tag_id: str) List[Dict]

Exports samples in a format compatible with Labelbox.

The format is documented here: https://docs.labelbox.com/docs/images-json

Parameters

tag_id – Id of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox.

export_label_box_data_rows_by_tag_name(tag_name: str) List[Dict]

Exports samples in a format compatible with Labelbox.

The format is documented here: https://docs.labelbox.com/docs/images-json

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Labelbox.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_box_data_rows_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-labelbox-rows.json', 'w') as f:
>>>     json.dump(tasks, f)
export_label_studio_tasks_by_tag_id(tag_id: str) List[Dict]

Exports samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

Parameters

tag_id – Id of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Label Studio.

export_label_studio_tasks_by_tag_name(tag_name: str) List[Dict]

Exports samples in a format compatible with Label Studio.

The format is documented here: https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format

Parameters

tag_name – Name of the tag which should exported.

Returns

A list of dictionaries in a format compatible with Label Studio.

Examples

>>> # write json file which can be imported in Label Studio
>>> tasks = client.export_label_studio_tasks_by_tag_name(
>>>     'initial-tag'
>>> )
>>>
>>> with open('my-label-studio-tasks.json', 'w') as f:
>>>     json.dump(tasks, f)
get_all_datasets() List[lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData]

Returns all datasets the user has access to.

get_all_tags() List[lightly.openapi_generated.swagger_client.models.tag_data.TagData]

Gets all tags on the server

Returns

one TagData entry for each tag on the server

get_compute_worker_ids() List[str]

Returns the ids of all registered compute workers.

get_compute_worker_runs() List[lightly.openapi_generated.swagger_client.models.docker_run_data.DockerRunData]

Returns all compute worker runs for the user.

get_dataset_by_id(dataset_id: str)

Returns the dataset for the given dataset id.

get_datasets(shared: bool = False) List[lightly.openapi_generated.swagger_client.models.dataset_data.DatasetData]

Returns all datasets the user owns.

Parameters

shared – If True, only returns the datasets which have been shared with the user.

get_datasource() lightly.openapi_generated.swagger_client.models.datasource_config.DatasourceConfig

Calls the api to return the datasource of the current dataset.

Returns

Datasource data of the datasource of the current dataset.

Raises

ApiException if no datasource was configured.

get_embedding_by_name(name: str, ignore_suffix: bool = True) lightly.openapi_generated.swagger_client.models.dataset_embedding_data.DatasetEmbeddingData

Gets an embedding form the server by name.

Parameters
  • name – The name of the embedding to get.

  • ignore_suffix – If true, a suffix of the embedding name on the server is ignored.

Returns

The embedding data.

Raises

EmbeddingDoesNotExistError – If the name does not match the name of an embedding on the server.

get_filenames() List[str]

Downloads the list of filenames from the server.

This is an expensive operation, especially for large datasets.

get_filenames_in_tag(tag_data: lightly.openapi_generated.swagger_client.models.tag_data.TagData, filenames_on_server: Optional[List[str]] = None, exclude_parent_tag: bool = False) List[str]

Gets the filenames of a tag

Parameters
  • tag_data – The data of the tag.

  • filenames_on_server – List of all filenames on the server. If they are not given, they need to be downloaded, which is quite expensive.

  • exclude_parent_tag – Excludes the parent tag in the returned filenames.

Returns

filenames_tag – The filenames of all samples in the tag.

get_metadata_read_url(filename: str)

Returns a read-url for .lightly/metadata/{filename}.

Parameters

filename – Filename for which to get the read-url.

Returns the read-url. If the file does not exist, a read-url is returned anyways.

get_prediction_read_url(filename: str)

Returns a read-url for .lightly/predictions/{filename}.

Parameters

filename – Filename for which to get the read-url.

Returns the read-url. If the file does not exist, a read-url is returned anyways.

get_processed_until_timestamp() int

Returns the timestamp until which samples have been processed.

Returns

Unix timestamp of last processed sample

get_scheduled_compute_worker_runs() List[lightly.openapi_generated.swagger_client.models.docker_run_scheduled_data.DockerRunScheduledData]

Returns a list of all scheduled compute worker runs for the current dataset.

index_custom_metadata_by_filename(custom_metadata: Dict) Dict[str, Optional[Dict]]

Creates an index to lookup custom metadata by filename.

Parameters

custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.

Returns

A dictionary mapping from filenames to custom metadata. If there are no annotations for a filename, the custom metadata is None instead.

register_compute_worker(name: str = 'Default') str

Registers a new compute worker.

Parameters

name – The name of the compute worker.

Returns

The id of the newly registered compute worker.

schedule_compute_worker_run(worker_config: Optional[Dict[str, Any]] = None, lightly_config: Optional[Dict[str, Any]] = None, priority: str = 'MID') str

Schedules a run with the given configurations.

Parameters
Returns

The id of the scheduled run.

selection(selection_config: lightly.active_learning.config.selection_config.SelectionConfig, preselected_tag_id: Optional[str] = None, query_tag_id: Optional[str] = None) lightly.openapi_generated.swagger_client.models.tag_data.TagData

Performs a selection given the arguments.

Parameters
  • selection_config – The configuration of the selection.

  • preselected_tag_id – The tag defining the already chosen samples (e.g. already labelled ones), default: None.

  • query_tag_id – The tag defining where to sample from, default: None resolves to the initial-tag.

Returns

The newly created tag of the selection.

Raises
  • ApiException

  • ValueError

  • RuntimeError

set_azure_config(container_name: str, account_name: str, sas_token: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None

Sets the Azure configuration for the datasource of the current dataset.

Find a detailed explanation on how to setup Lightly with Azure Blob Storage in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_azure_storage.html#

Parameters
  • container_name – Container name of the dataset, for example: “my-container/path/to/my/data”.

  • account_name – Azure account name.

  • sas_token – Secure Access Signature token.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.

set_dataset_id_by_name(dataset_name: str)

Sets the dataset id given the name of the dataset

Parameters

dataset_name – The name of the dataset for which the dataset_id should be set as attribute

Raises: ValueError

set_embedding_id_to_latest()

Sets the self.embedding_id to the one of the latest on the server.

set_gcs_config(resource_path: str, project_id: str, credentials: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None

Sets the Google Cloud Storage configuration for the datasource of the current dataset.

Find a detailed explanation on how to setup Lightly with Google Cloud Storage in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_gcloud_bucket.html

Parameters
  • resource_path – GCS url of your dataset, for example: “gs://my_bucket/path/to/my/data”

  • project_id – GCS project id.

  • credentials – Content of the credentials JSON file stringified which you download from Google Cloud Platform.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.

set_local_config(resource_path: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]') None

Sets the local configuration for the datasource of the current dataset.

Find a detailed explanation on how to setup Lightly with a local file server in our docs: https://docs.lightly.ai/getting_started/dataset_creation/dataset_creation_local_server.html

Parameters
  • resource_path – Url to your local file server, for example: “http://localhost:1234/path/to/my/data”.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

set_s3_config(resource_path: str, region: str, access_key: str, secret_access_key: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None

Sets the S3 configuration for the datasource of the current dataset.

Parameters
  • resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.

  • region – S3 region where the dataset bucket is located, for example “eu-central-1”.

  • access_key – S3 access key.

  • secret_access_key – Secret for the S3 access key.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.

set_s3_delegated_access_config(resource_path: str, region: str, role_arn: str, external_id: str, thumbnail_suffix: Optional[str] = '.lightly/thumbnails/[filename]_thumb.[extension]', purpose: str = 'INPUT_OUTPUT') None

Sets the S3 configuration for the datasource of the current dataset.

Parameters
  • resource_path – S3 url of your dataset, for example “s3://my_bucket/path/to/my/data”.

  • region – S3 region where the dataset bucket is located, for example “eu-central-1”.

  • role_arn – Unique ARN identifier of the role.

  • external_id – External ID of the role.

  • thumbnail_suffix – Where to save thumbnails of the images in the dataset, for example “.lightly/thumbnails/[filename]_thumb.[extension]”. Set to None to disable thumbnails and use the full images from the datasource instead.

  • purpose – Datasource purpose, determines if datasource is read only (INPUT) or can be written to as well (LIGHTLY, INPUT_OUTPUT). The latter is required when Lightly extracts frames from input videos.

update_processed_until_timestamp(timestamp: int) None

Sets the timestamp until which samples have been processed.

Parameters

timestamp – Unix timestamp of last processed sample

upload_custom_metadata(custom_metadata: Dict, verbose: bool = False, max_workers: int = 8)

Uploads custom metadata to the Lightly platform.

The custom metadata is expected in a format similar to the COCO annotations: Under the key “images” there should be a list of dictionaries, each with a file_name and id. Under the key “metadata” the custom metadata is stored as a list of dictionaries, each with a image_id to match it to the image.

Example

>>> custom_metadata = {
>>>     "images": [
>>>         {
>>>             "file_name": "image0.jpg",
>>>             "id": 0,
>>>         },
>>>         {
>>>             "file_name": "image1.jpg",
>>>             "id": 1,
>>>         }
>>>     ],
>>>     "metadata": [
>>>         {
>>>             "image_id": 0,
>>>             "number_of_people": 3,
>>>             "weather": {
>>>                 "scenario": "cloudy",
>>>                 "temperature": 20.3
>>>             }
>>>         },
>>>         {
>>>             "image_id": 1,
>>>             "number_of_people": 1,
>>>             "weather": {
>>>                 "scenario": "rainy",
>>>                 "temperature": 15.0
>>>             }
>>>         }
>>>     ]
>>> }
Parameters
  • custom_metadata – Custom metadata as described above.

  • verbose – If True displays a progress bar during the upload.

  • max_workers – Maximum number of concurrent threads during upload.

upload_dataset(input: Union[str, lightly.data.dataset.LightlyDataset], max_workers: int = 8, mode: str = 'thumbnails', custom_metadata: Optional[Dict] = None)

Uploads a dataset to to the Lightly cloud solution.

Parameters
  • input – Either the path to the dataset, e.g. “path/to/dataset”, or the dataset in form of a LightlyDataset

  • max_workers – Maximum number of workers uploading images in parallel.

  • mode – One of [full, thumbnails, metadata]. Whether to upload thumbnails, full images, or metadata only.

  • custom_metadata – COCO-style dictionary of custom metadata to be uploaded.

Raises
  • ValueError – If dataset is too large or input has the wrong type

  • RuntimeError – If the connection to the server failed.

upload_embeddings(path_to_embeddings_csv: str, name: str)

Uploads embeddings to the server.

First checks that the specified embedding name is not on the server. If it is, the upload is aborted. Then creates a new csv with the embeddings in the order specified on the server. Next it uploads it to the server. The received embedding_id is saved as a property of self.

Parameters
  • path_to_embeddings_csv – The path to the .csv containing the embeddings, e.g. “path/to/embeddings.csv”

  • name – The name of the embedding. If an embedding with such a name already exists on the server, the upload is aborted.

upload_file_with_signed_url(file: io.IOBase, signed_write_url: str, headers: Optional[Dict] = None) requests.models.Response

Uploads a file to a url via a put request.

Parameters
  • file – The file to upload.

  • signed_write_url – The url to upload the file to. As no authorization is used, the url must be a signed write url.

  • headers – Specific headers for the request.

Returns

The response of the put request, usually a 200 for the success case.

verify_custom_metadata_format(custom_metadata: Dict)

Verifies that the custom metadata is in the correct format.

Parameters

custom_metadata – Dictionary of custom metadata, see upload_custom_metadata for the required format.

Raises

KeyError – If “images” or “metadata” aren’t a key of custom_metadata.

Upload Dataset Mixin

exception lightly.api.api_workflow_upload_embeddings.EmbeddingDoesNotExistError
exception lightly.api.api_workflow_upload_metadata.InvalidCustomMetadataWarning