Multiple Workers

The Lightly Worker is designed to schedule multiple runs simultaneously and process them either sequentially or in parallel, depending on your requirements.

Starting Up a Specific Worker

By default, when you start the Lightly Worker without specifying the LIGHTLY_WORKER_ID environment variable, we register and assume a default worker. This allows you to easily start multiple workers and to schedule and process runs concurrently. All default workers will process all of your scheduled runs due to label matching.

However, if you wish to be specific which worker should be started, you can pass LIGHTLY_WORKER_ID as an environment variable when starting the worker. This allows you to assign scheduled runs to specific workers where some workers could be running in different data centers with different CPU, memory, or GPU constraints or represent different priority queues of yours.
The Lightly Platforms My Workers page provides an overview of all your registered workers.

docker run --shm-size="1024m" --gpus all --rm -it \
	-v "MY_PATH_TO_INPUT_DIRECTORY":/input_mount:ro \
	-v "MY_PATH_TO_LIGHTLY_DIRECTORY":/lightly_mount \
	-e LIGHTLY_TOKEN="MY_LIGHTLY_TOKEN" \
	-e LIGHTLY_WORKER_ID="MY_WORKER_ID" \
	lightly/worker:latest
docker run --shm-size="1024m" --gpus all --rm -it \
	-e LIGHTLY_TOKEN="MY_LIGHTLY_TOKEN" \
	-e LIGHTLY_WORKER_ID="MY_WORKER_ID" \
	lightly/worker:latest

Assign Scheduled Runs to Specific Workers

When using multiple Lightly Workers that should work on multiple scheduled runs, it can be very useful to assign scheduled runs to specific workers. Lightly also offers this feature. It is based on labels:

  • Each worker can have a set of labels, e.g. ["gpu-A100", "gpu", "machine1", "team_worker", "aws_1"]. Multiple workers can have some labels in common (e.g. multiple gpu workers), but also have labels that only they have.
  • When scheduling a run, you can specify a set of labels the worker picking up the run must have. E.g. specify ["gpu-A100"] to let the run be picked up by any worker with the label gpu-A100.

Specifying Worker Labels

When registering a worker as outlined below, it is possible to optionally specify labels by passing the labels argument when calling client.register_compute_worker(). The labels of a Lightly Worker must be assigned when registering it and cannot be changed later. Lightly Workers do not have any default labels.

# execute the following code once to get a worker_id
from lightly.api import ApiWorkflowClient

client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN") # replace this with your token
worker_id = client.register_compute_worker(
  name="worker-with-labels",
  labels=["gpu-A100", "gpu", "bobs_worker", "team_worker"] # optional
)
print(f"worker_id: {worker_id}")

Next, start the Lightly Worker on your machine with the just created worker_id. To get an overview of all of your workers, see the Lightly Platforms My Workers page.

Specifying Labels when Scheduling a Run

Follow the the usual steps of scheduling a run with one change: Additionally, specify the runs_on argument when calling client.schedule_compute_worker_run():

from lightly.api import ApiWorkflowClient

# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
	runs_on =["gpu-A100"] # optional to schedule a run for workers with this label
	worker_config={...},
	selection_config={...},
)
print(f"scheduled_run_id: {scheduled_run_id}")

This scheduled run is going to be picked up only by Workers that have the label gpu-A100 among their labels.

Label Matching

The Worker only picks up runs whose runs_on labels are a SUBSET of its own labels.

For legacy reasons, workers without labels will pick up all runs, including those that have a runs_on specified.
For the same reason, runs without the runs_on specified will be picked up by all workers, including those that have labels.
Thus we recommend you fully switch to using labels for both all your workers and all your scheduled runs.