Multiple Workers
The LightlyOne Worker is designed to schedule multiple runs simultaneously and process them either sequentially or in parallel, depending on your requirements.
Starting Up a Specific Worker
By default, when you start the LightlyOne Worker without specifying the LIGHTLY_WORKER_ID
environment variable, we register and assume a default worker. This allows you to easily start multiple workers and to schedule and process runs concurrently. All default workers will process all of your scheduled runs due to label matching.
However, if you wish to be specific which worker should be started, you can pass LIGHTLY_WORKER_ID
as an environment variable when starting the worker. This allows you to assign scheduled runs to specific workers where some workers could be running in different data centers with different CPU, memory, or GPU constraints or represent different priority queues of yours.
The LightlyOne Platforms My Workers page provides an overview of all your registered workers.
docker run --shm-size="1024m" --gpus all --rm -it \
-v "MY_PATH_TO_INPUT_DIRECTORY":/input_mount:ro \
-v "MY_PATH_TO_LIGHTLY_DIRECTORY":/lightly_mount \
-e LIGHTLY_TOKEN="MY_LIGHTLY_TOKEN" \
-e LIGHTLY_WORKER_ID="MY_WORKER_ID" \
lightly/worker:latest
docker run --shm-size="1024m" --gpus all --rm -it \
-e LIGHTLY_TOKEN="MY_LIGHTLY_TOKEN" \
-e LIGHTLY_WORKER_ID="MY_WORKER_ID" \
lightly/worker:latest
Assign Scheduled Runs to Specific Workers
When using multiple LightlyOne Workers that should work on multiple scheduled runs, it can be very useful to assign scheduled runs to specific workers. LightlyOne also offers this feature. It is based on labels:
- Each worker can have a set of labels, e.g.
["gpu-A100", "gpu", "machine1", "team_worker", "aws_1"]
. Multiple workers can have some labels in common (e.g. multiplegpu
workers), but also have labels that only they have. - When scheduling a run, you can specify a set of labels the worker picking up the run must have. E.g. specify
["gpu-A100"]
to let the run be picked up by any worker with the labelgpu-A100
.
Specifying Worker Labels
When registering a worker as outlined below, it is possible to optionally specify labels by passing the labels
argument when calling client.register_compute_worker()
. The labels of a LightlyOne Worker must be assigned when registering it and cannot be changed later. LightlyOne Workers do not have any default labels.
# execute the following code once to get a worker_id
from lightly.api import ApiWorkflowClient
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN") # replace this with your token
worker_id = client.register_compute_worker(
name="worker-with-labels",
labels=["gpu-A100", "gpu", "bobs_worker", "team_worker"] # optional
)
print(f"worker_id: {worker_id}")
Next, start the LightlyOne Worker on your machine with the just created worker_id
. To get an overview of all of your workers, see the LightlyOne Platforms My Workers page.
Specifying Labels when Scheduling a Run
Follow the the usual steps of scheduling a run with one change: Additionally, specify the runs_on
argument when calling client.schedule_compute_worker_run()
:
from lightly.api import ApiWorkflowClient
# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
runs_on =["gpu-A100"] # optional to schedule a run for workers with this label
worker_config={...},
selection_config={...},
)
print(f"scheduled_run_id: {scheduled_run_id}")
This scheduled run is going to be picked up only by Workers that have the label gpu-A100
among their labels.
Label Matching
The Worker only picks up runs whose runs_on
labels are a SUBSET of its own labels.
For legacy reasons, workers without labels will pick up all runs, including those that have a runs_on
specified.
For the same reason, runs without the runs_on
specified will be picked up by all workers, including those that have labels.
Thus we recommend you fully switch to using labels for both all your workers and all your scheduled runs.
Updated 2 months ago