Local Storage
This page explains how to provide data to Lightly from your local disk.
Version requirement
This feature requires Lightly Worker 2.9 or higher and Lightly Python API Client 1.4.16 or higher.
Visit the Installation page to see how you can get the latest versions.
Datasources
Datasource is Lightly's name for the input or output location for data. To run Lightly, you need to provide two datasources.
Input datasource is the location of your input images or videos. Can be read only.
Lightly datasource is the location of additional Lightly-specific data such as custom metadata or predictions. This datasource is also used for output artifacts such as PDF report or data for Lightly Platform functionality such as image thumbnails. Must be writeable.
For more information about the datasources please refer to Cloud Storage page.
Overview
Local data are provided to Lightly in three steps:
- When starting Lightly Worker, mount your data folders to the docker container
- When scheduling a worker run, specify the location of your data relative to the mounted folders
- Optional: To display your data in Lightly Platform, run a local web server
Mount Your Data
Lightly recognizes two kinds of local paths. Mount path is a local directory mounted to the docker image. Datasource path is a path within this directory that contains data for your current run. They can be the same, having them different is useful e.g. if you have multiple datasets in a common folder that you would like to process without restarting the docker.
The following shows an example folder structure in your local environment.
/home/user/project_xyz/input
βββ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.png
βββ ...
βββ 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.png
/home/user/project_xyz/lightly
βββ .lightly/predictions/
βββ tasks.json
βββ yolov7_prediction_task/
βββ schema.json
βββ 0000_0085e9e41513078a_2018-08-19--13-26-08_11_864.json
βββ ...
βββ 0999_e8e95b54ed6116a6_2018-10-22--11-26-21_3_339.json
When starting the docker, the mount path for the input datasource must be mounted to /input_mount
and the mount path for the Lightly datasource must be mounted to /lightly_mount
:
Docker requires mount paths to be absolute!
Volume mounting folders to a Docker container can change file permissions of all mounted files. We therefore recommend the following best practice: Choose the mount path as specific as possible while maintaining enough flexibility to access all data of interest. At the end of this chapter you can find a complete example. See mount permissions for more information regarding permissions.
docker run ... \
-v "/home/user/project_xyz/input":/input_mount:ro \
-v "/home/user/project_xyz/lightly":/lightly_mount \
...
Set the Datasource
Provide your datasource paths before scheduling your run. The specified paths must be relative to the corresponding mount paths. If your mount path and datasource path coincide pass an empty string.
Optionally, a web server location can be provided, otherwise, it defaults to http://localhost:3456
. This is a location where Lightly Platform will expect your images to be served.
from lightly.api import ApiWorkflowClient
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
client.create_dataset(
dataset_name="dataset-name",
dataset_type=DatasetType.IMAGES # can be DatasetType.VIDEOS when working with videos
)
client.set_local_config(
relative_path="", # Relative path in the input mount folder (/home/user/project_xyz/input/)
web_server_location="http://localhost:3456", # Optional
purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
relative_path="", # Relative path in the lightly mount folder (/home/user/project_xyz/lightly/)
web_server_location="http://localhost:3456", # Optional
purpose=DatasourcePurpose.LIGHTLY,
)
When providing data with the Lightly datasource, note that you need to specify the folder that contains the
.lightly
subfolder.
Schedule Your Run
Scheduling the run works the same for local and cloud storage. If you encounter any permission errors in accessing the local storage, head to our FAQ.
from lightly.api import ApiWorkflowClient
scheduled_run_id = client.schedule_compute_worker_run(
worker_config={},
selection_config={
"n_samples": 50,
"strategies": [
{"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
],
},
)
Optional After Run: View Local Data in Lightly Platform
You can inspect the results of your run in Lightly Platform. A Lightly dataset configured with a local datasource will always try to access images at the web-server-location specified when configuring the datasource The default is localhost:3456
. Furthermore, all read-URLs generated by any of the export features will point to this location.
Thus you need to point the localhost:port
to your images or resources on your disk. This can be done easily using the lightly-serve CLI command provided by the lightly python SDK.
# host and port are optional
lightly-serve \
input_mount="/home/user/project_xyz/input" \
lightly_mount="/home/user/project_xyz/lightly" \
host="localhost" \
port=3456
The host and port are optional, you only need to change them if you did not use the default web server location when configuring the datasource.
To test if it is working, browse your dataset in the Lightly Platform, or directly access localhost:3456
in your browser.
Data Privacy
The
lightly-serve
command makes the images/resources on your local disk only available to yourlocalhost
, i.e. within your local system. Thus, they are not accessible to any application outside your local computer.
View Local Data in Remote Machine in Lightly Platform
If your browser runs on your notebook, but you used the local storage on a remote machine, the setup is slightly different:
First, you need to run lightly-serve
on the remote machine where the actual images/resources are located. Second, you need to forward the remote port to the local port. E.g. if you use ssh, this can be done with the -L
feature: ssh -N -L 3456:localhost:3456 [username]@[remote-machine-address]
. Or if you use VSCode, you can do it in the ports tab:
Complete Example
This example puts together all the steps needed to run Lightly with local data. It assumes that the user wants to process videos stored in /home/user/project_xyz/input/cam1
and has a Lightly datasource in /home/user/project_xyz/lightly/cam1
.
The local data structure is the following
/home/user/project_xyz/input/
βββ cam1/
βββ 95b54ed6116a.mp4
βββ ...
βββ 6116a6e8easd.mp4
/home/user/project_xyz/lightly/
βββ cam1/
βββ .lightly/
To process this dataset we can configure the Lightly Worker and schedule a run using the following example:
docker run --shm-size "1024m" --gpus all --rm -it \
-v "/home/user/project_xyz/input":/input_mount:ro \
-v "/home/user/project_xyz/lightly":/lightly_mount \
-e LIGHTLY_TOKEN="MY_LIGHTLY_TOKEN" \
-e LIGHTLY_WORKER_ID="MY_WORKER_ID" \
lightly/worker:latest
# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose
# Create the Lightly client and a dataset
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
client.create_dataset(
dataset_name="dataset-name",
dataset_type=DatasetType.VIDEOS # Use DatasetType.IMAGES when working with images
)
# Set up your datasources
client.set_local_config(
relative_path="cam1", # Relative path in the input mount folder (/home/user/project_xyz/input/)
purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
relative_path="cam1", # Relative path in the lightly mount folder (/home/user/project_xyz/lightly/)
purpose=DatasourcePurpose.LIGHTLY,
)
# Schedule a run with a basic diversity selection strategy
scheduled_run_id = client.schedule_compute_worker_run(
selection_config={
"n_samples": 10,
"strategies": [
{"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
],
},
)
print(scheduled_run_id)
lightly-serve input_mount="/home/user/project_xyz/input" lightly_mount="/home/user/project_xyz/lightly"
Updated about 23 hours ago