Local Storage

This page explains how to provide data to Lightly from your local disk.

πŸ“˜

Version requirement

This feature requires Lightly Worker 2.9 or higher and Lightly Python API Client 1.4.16 or higher.

Visit the Installation page to see how you can get the latest versions.

Datasources

Datasource is Lightly's name for the input or output location for data. To run Lightly, you need to provide two datasources.

Input datasource is the location of your input images or videos. Can be read only.

Lightly datasource is the location of additional Lightly-specific data such as custom metadata or predictions. This datasource is also used for output artifacts such as PDF report or data for Lightly Platform functionality such as image thumbnails. Must be writeable.

For more information about the datasources please refer to Cloud Storage page.

Overview

Local data are provided to Lightly in three steps:

  1. When starting Lightly Worker, mount your data folders to the docker container
  2. When scheduling a worker run, specify the location of your data relative to the mounted folders
  3. Optional: To display your data in Lightly Platform, run a local web server

Mount your data

Lightly recognizes two kinds of local paths. Mount path is a local directory mounted to the docker image. Datasource path is a path within this directory that contains data for your current run. They can be the same, having them different is useful e.g. if you have multiple datasets in a common folder that you would like to process without restarting the docker.

The following shows an example folder structure in your local environment.

/home/username/project_xyz/input
β”œβ”€β”€ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.png
β”œβ”€β”€ ...
└── 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.png

/home/username/project_xyz/lightly
└── .lightly/predictions/
    β”œβ”€β”€ tasks.json
    └── yolov7_prediction_task/
          β”œβ”€β”€ schema.json
          β”œβ”€β”€ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.json
          β”œβ”€β”€ ...
          └── 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.json

When starting the docker, the mount path for the input datasource must be mounted to /input_mount and the mount path for the Lightly datasource must be mounted to /lightly_mount:

πŸ“˜

Docker requires mount paths to be absolute!

πŸ“˜

Volume mounting folders to a Docker container can change file permissions of all mounted files. We therefore recommend the following best practice: Choose the mount path as specific as possible while maintaining enough flexibility to access all data of interest. At the end of this chapter you can find a complete example.

docker run ... \
  -v /home/username/project_xyz/input:/input_mount:ro \
  -v /home/username/project_xyz/lightly:/lightly_mount \
  ...

Set the datasource

Provide your datasource paths before scheduling your run. The specified paths must be relative to the corresponding mount paths. If your mount path and datasource path coincide pass an empty string.

Optionally, a web server location can be provided, otherwise, it defaults to http://localhost:3456. This is a location where Lightly Platform will expect your images to be served.

from lightly.api import ApiWorkflowClient

client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)
client.set_local_config(
    relative_path="",  # Relative path in the input mount folder (project_xyz/input/)
    web_server_location="http://localhost:3456",  # Optional    
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="",  # Relative path in the lightly mount folder (project_xyz/lightly/)
    web_server_location="http://localhost:3456",  # Optional
    purpose=DatasourcePurpose.LIGHTLY,
)

πŸ“˜

When providing data with the Lightly datasource, note that you need to specify the folder that contains the .lightly subfolder.

Schedule your run

Scheduling the run works the same for local and cloud storage. If you encounter any permission errors in accessing the local storage, head to our FAQ.

from lightly.api import ApiWorkflowClient

scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

Optional after run: View local data in Lightly Platform

You can inspect the results of your run in Lightly Platform. A Lightly dataset configured with a local datasource will always try to access images at the web-server-location specified when configuring the datasource The default is localhost:3456. Furthermore, all read-URLs generated by any of the export features will point to this location.

Thus you need to point the localhost:portto your images or resources on your disk. This can be done easily using the lightly-serve CLI command provided by the lightly python SDK.

# host and port are optional
lightly-serve \
  input_mount=/home/username/project_xyz/input \
  lightly_mount=/home/username/project_xyz/lightly \
  host=localhost \
  port=3456

The host and port are optional, you only need to change them if you did not use the default web server location when configuring the datasource.
To test if it is working, browse your dataset in the Lightly Platform, or directly access localhost:3456 in your browser.

πŸ“˜

Data Privacy

The lightly-serve command makes the images/resources on your local disk only available to your localhost, i.e. within your local system. Thus, they are not accessible to any application outside your local computer.

View local data in remote machine in Lightly platform.

If your browser runs on your notebook, but you used the local storage on a remote machine, the setup is slightly different:

First, you need to run lightly-serveon the remote machine where the actual images/resources are located. Second, you need to forward the remote port to the local port. E.g. if you use ssh, this can be done with the -Lfeature: ssh -L 3456:localhost:3456 [username]@[remote-machine-address]. Or if you use VSCode, you can do it in the ports tab:

Complete example

This example puts together all the steps needed to run Lightly with local data. It assumes that the user wants to process videos stored in /home/user/datasets/cam1 and has a Lightly datasource in /home/user/lightly/cam1.

The local data structure is the following

/home/user/datasets/
└── cam1/
    β”œβ”€β”€ 95b54ed6116a.mp4
    β”œβ”€β”€ ...
    └── 6116a6e8easd.mp4

/home/user/lightly/
└── cam1/
    └── .lightly/

To process this dataset we can configure the Lightly Worker and schedule a run using the following example:

docker run --shm-size "1024m" --gpus all --rm -it \
  -v /home/user/datasets:/input_mount:ro \
  -v /home/user/lightly:/lightly_mount \
  -e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
  -e LIGHTLY_WORKER_ID={MY_WORKER_ID} \
  lightly/worker:latest
# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client and a dataset
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.VIDEOS  # Use DatasetType.IMAGES when working with images
)

# Set up your datasources
client.set_local_config(
    relative_path="cam1",  # Relative path in the input mount folder (/home/user/datasets/)
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="cam1",  # Relative path in the lightly mount folder (/home/user/lightly/)
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule a run with a basic diversity selection strategy
scheduled_run_id = client.schedule_compute_worker_run(
    selection_config={
        "n_samples": 10,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(scheduled_run_id)
lightly-serve input_mount=/home/user/datasets lightly_mount=/home/user/lightly