Cloud Storage

Datasources

Datasources are Lightly's way of accessing data in your cloud storage. They are always associated with a dataset and need to be configured with credentials from your cloud provider. Currently, Lightly integrates with the following cloud providers:

To create a datasource, you must specify a dataset, the credentials, and a resource_path. The resource_path must point to an existing directory within your storage bucket. This directory must exist but can be empty.

Lightly requires you to configure an Input and a Lightly datasource. They are explained in detail below.

Dataset

As shown in Set Up Your First Dataset you can easily set up a dataset from Python. The dataset stores the results from your Lightly Worker runs and provides access to the selected images.

You can choose the input type for your dataset (image or video):

from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(dataset_name="dataset-name", dataset_type=DatasetType.IMAGES)
dataset_id = client.dataset_id
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(dataset_name="dataset-name", dataset_type=DatasetType.VIDEOS)
dataset_id = client.dataset_id
from lightly.api import ApiWorkflowClient

client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

Supported File Types

Lightly supports different file types within your cloud storage:

Images

  • png
  • jpg/ jpeg
  • bmp
  • gif
  • tiff

Videos

  • mov
  • mp4
  • avi

See Video as Input for a detailed list of supported video containers and codecs.

Input Datasource

The Input datasource is where Lightly reads your raw input data from. Lightly requires list and read access to it. Please refer to the documentation of the cloud storage provider you use for the specific permissions needed.

You can configure your Input datasource from Python as follows:

from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT,
)
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Conifgure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.INPUT,
)
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Input datasource.
client.set_azure_config(
    container_name="my-container/input/",
    account_name="ACCOUNT-NAME",
    sas_token="SAS-TOKEN",
    purpose=DatasourcePurpose.INPUT,
)
import json
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open("credentials_read.json"))),
    purpose=DatasourcePurpose.INPUT,
)

📘

Input Structure

Lightly is agnostic to nested directories, so there are no requirements on the input data structure within the input datasource. However, Lightly can only access data in the path of the input datasource, so make sure all the data you want to process is in the right place.

📘

Datatypes

Lightly currently works on images and videos. You can specify which type of input data you want to process when creating a dataset.

Lightly Datasource

The Lightly bucket serves as an interactive bucket where Lightly can read things from but also write output data to. Lightly, therefore, requires list, read, and write access to the Lightly bucket. Please refer to the documentation of the cloud storage provider you are using for the specific permissions needed. You can have separate credentials or use the same as for the Input bucket. The Lightly bucket can point to a different directory in the same or another bucket (even located at a different cloud storage provider).

Here is an overview of what the Lightly bucket is used for:

  • Saving thumbnails of images for a more responsive experience in the Lightly Platform.
  • Saving images of cropped-out objects if you use the object-level workflow.
  • Saving frames of videos if your input consists of videos.
  • Providing the relevant filenames file if you want to run the Lightly Worker only on a subset of input files.
    See also.
  • Providing predictions for running the object-level workflow or as additional information for the selection process.
    See also Prediction Format.
  • Providing metadata as additional information for the selection process. See also Metadata Format.

You can configure your Lightly datasource from Python as follows:

from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY,
)
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.LIGHTLY,
)
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Lightly datasource.
client.set_azure_config(
    container_name="my-container/lightly/",
    account_name="ACCOUNT-NAME",
    sas_token="SAS-TOKEN",
    purpose=DatasourcePurpose.LIGHTLY,
)
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/lightly/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open("credentials_write.json"))),
    purpose=DatasourcePurpose.LIGHTLY,
)

Verify Datasource

Once you have set up and configured the datasources, it is crucial to ensure that Lightly has the proper permissions. This can be done via code or in the Lightly Platform. The Lightly Worker will also make this check when scheduling a run.

verify = client._datasources_api.verify_datasource_by_dataset_id(
    dataset_id=client.dataset_id
)
# assert Lightly access permissions
try:
    assert verify.can_list
    assert verify.can_read
    assert verify.can_write
    assert verify.can_overwrite
except AssertionError:
    print("Datasources are missing permissions. Potential errors are:", verify.errors)

Visual Feedback

In the Lightly Platform, when editing your dataset from the datasets home view, you can configure and verify your datasource.

If Lightly cannot verify the datasource, we will show you red lightbulbs. Hovering over them will give you more insights into why things could fail. In those cases, please read the error messages displayed and ensure that the resource path, credentials, and the permissions of those credentials comply with the requirements of Lightly as outlined in the respective AWS S3, Google Cloud Storage and Azure documentations.

25982598

Error details can be obtained by hovering over a red lightbulb.

When everything works as expected, you will see green lightbulbs, as shown below. If this is the case, you are all set and can go ahead and run a selection!

26002600

Green lightbulbs indicate you have setup everything correctly.