Local Storage

This page explains how to provide data to Lightly from your local disk.

πŸ“˜

Version requirement

This feature requires Lightly Worker 2.9 or higher and Lightly Python API Client 1.4.16 or higher.

Visit the Installation page to see how you can get the latest versions.

Datasources

Datasource is Lightly's name for input or output location for data. To run Lightly, you need to provide two datasources.

Input datasource is the location of your input images or videos. Can be read only.

Lightly datasource is the location of additional Lightly-specific data such as custom metadata or predictions. This datasource is also used for output artifacts such as PDF report or data for Lightly Platform functionality such as image thumbnails. Must be writeable.

For more information about the datasources please refer to Cloud Storage page.

Overview

Local data are provided to Lightly in three steps:

  1. When starting Lightly Worker, mount your data folders to the docker container
  2. When scheduling a worker run, specify location of your data relative to the mounted folders
  3. Optional: To display your data in Lightly Platform, run a local web server

Mount your data

Lightly recognizes two kinds of local paths. Mount path is a local directory mounted to the docker image. Datasource path is a path within this directory that contains data for your current run. They can be the same, having them different is useful e.g. if you have multiple datasets in a common folder that you would like to process without restarting the docker.

The following shows an example folder structure in your local environment.

/home/username/project_xyz/input
β”œβ”€β”€ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.png
β”œβ”€β”€ ...
└── 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.png

/home/username/project_xyz/lightly
└── .lightly/predictions/
    β”œβ”€β”€ tasks.json
    └── yolov7_prediction_task/
          β”œβ”€β”€ schema.json
          β”œβ”€β”€ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.json
          β”œβ”€β”€ ...
          └── 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.json

When starting the docker, the mount path for the input datasource must be mounted to /input_mount and the mount path for the Lightly datasource must be mounted to /lightly_mount:

πŸ“˜

Docker requires mount paths to be absolute!

πŸ“˜

Volume mounting folders to a Docker container can change file permissions of all mounted files. We therefore recommend the following best practice: Choose the mount path as specific as possible while maintaining enough flexibility to access all data of interest. At the end of this chapter you can find a complete example.

docker run ... \
  -v /home/username/project_xyz/input:/input_mount:ro \
  -v /home/username/project_xyz/lightly:/lightly_mount \
  ...

Schedule your run

Provide your datasource paths when scheduling your run. The specified paths must be relative to the corresponding mount paths. If your mount path and datasource path coincide pass an empty string.

Optionally, a web server location can be provided, otherwise it defaults to http://localhost:3456. This is a location where Lightly Platform will expect your images to be served.

client.set_local_config(
    relative_path="",  # Relative path in the input mount folder (project_xyz/input/)
    web_server_location="http://localhost:3456",  # Optional    
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="",  # Relative path in the lightly mount folder (project_xyz/lightly/)
    web_server_location="http://localhost:3456",  # Optional
    purpose=DatasourcePurpose.LIGHTLY,
)

πŸ“˜

When providing data with the Lightly datasource, note that you need to specify the folder that contains the .lightly subfolder.

Optional: View local data in Lightly Platform

You can inspect results of your run in Lightly Platform. Lightly Platform will generate urls for images and other resources pointing to the web server location specified when setting up the datasource. This is secure because the resources can be accessed only where the web server is accessible, which is e.g. just your local machine.

Lightly python package provides a helper script to serve the datasource mount paths. Host and port are optional parameters that default to localhost:3456.

# host and port are optional
lightly-serve \
  input_mount=/home/username/project_xyz/input \
  lightly_mount=/home/username/project_xyz/lightly \
  host=localhost \
  port=3456

Complete example

This example puts together all the steps needed to run Lightly with local data. It assumes that the user wants to process videos stored in /home/user/datasets/cam1 and has a Lightly datasource in /home/user/lightly/cam1.

The local data structure is the following

/home/user/datasets/
└── cam1/
    β”œβ”€β”€ 95b54ed6116a.mp4
    β”œβ”€β”€ ...
    └── 6116a6e8easd.mp4

/home/user/lightly/
└── cam1/
    └── .lightly/

To process this dataset we can configure the Lightly Worker and schedule a run using the following example:

docker run --shm-size "1024m" --gpus all --rm -it \
  -v /home/user/datasets:/input_mount:ro \
  -v /home/user/lightly:/lightly_mount \
  -e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
  -e LIGHTLY_WORKER_ID={MY_WORKER_ID} \
  lightly/worker:latest
# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client and a dataset
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.VIDEOS  # Use DatasetType.IMAGES when working with images
)

# Set up your datasources
client.set_local_config(
    relative_path="cam1",  # Relative path in the input mount folder (/home/user/datasets/)
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="cam1",  # Relative path in the lightly mount folder (/home/user/lightly/)
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule a run with a basic diversity selection strategy
scheduled_run_id = client.schedule_compute_worker_run(
    selection_config={
        "n_samples": 10,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(scheduled_run_id)
lightly-serve input_mount=/home/user/datasets lightly_mount=/home/user/lightly