Run Your First Selection
Requirements
- You have started the Lightly Worker (see Install Lightly).
- You have created a dataset and configured a datasource.
Scheduling a Run
Now that everything is in place, let’s configure a run:
from lightly.api import ApiWorkflowClient
# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
worker_config={},
selection_config={
"n_samples": 50,
"strategies": [
{"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
],
},
)
print(scheduled_run_id)
The worker_config
allows you to configure the Lightly Worker in many ways, see Configuration Options.
The selection_config
will make the Lightly Worker choose 50 samples from the initial dataset that are as diverse as possible. This is done using the embeddings, which are automatically created during the run. Additional information, more options, and possibilities regarding the selection config can be found on our Customize a Selection page.
Monitoring a Run
The Lightly Worker will pick up the run and start working on it within a few seconds. The status of the current run and other scheduled runs can be seen in the runs view of the Lightly Platform. Alternatively, you can also monitor it from Python:
from lightly.api import ApiWorkflowClient
# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
# You can use this code to track and print the state of the Lightly Worker.
# The loop will end once the run has finished, was canceled, or failed.
print(scheduled_run_id)
for run_info in client.compute_worker_run_info_generator(
scheduled_run_id=scheduled_run_id
):
print(
f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
)
if run_info.ended_successfully():
print("SUCCESS")
else:
print("FAILURE")
You should see an output similar to this one:
63722b94c5def52068655308
Lightly Worker run is now in state='INITIALIZING' with message='State set to INITIALIZING'
Lightly Worker run is now in state='CHECKING_CORRUPTNESS' with message='State set to CHECKING_CORRUPTNESS'
Lightly Worker run is now in state='EMBEDDING' with message='State set to EMBEDDING'
Lightly Worker run is now in state='SAMPLING' with message='State set to SAMPLING'
Lightly Worker run is now in state='GENERATING_REPORT' with message='State set to GENERATING_REPORT'
Lightly Worker run is now in state='COMPLETED' with message='State set to COMPLETED'
SUCCESS
Putting It All Together
We recommend putting all the pieces together in a single file. When processing new data you would execute the Python script to create a job and then spin up the Lightly Worker to process it.
'''
This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with Lightly from scratch.
By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''
# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose
# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
# Create a new dataset on the Lightly Platform.
client.create_dataset(
dataset_name="dataset-name",
dataset_type=DatasetType.IMAGES # can be DatasetType.VIDEOS when working with videos
)
# Configure the input datasource. You need to adapt this step according to your datasource
# and credentials. See https://docs2.lightly.ai/docs/set-up-your-first-dataset for more information.
# In this tutorial S3 is used.
client.set_s3_config(
resource_path="s3://bucket/input/",
region="eu-central-1",
access_key="S3-ACCESS-KEY",
secret_access_key="S3-SECRET-ACCESS-KEY",
purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
resource_path="s3://bucket/lightly/",
region="eu-central-1",
access_key="S3-ACCESS-KEY",
secret_access_key="S3-SECRET-ACCESS-KEY",
purpose=DatasourcePurpose.LIGHTLY
)
# Configure and schedule a run following the selection strategy in
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
worker_config={},
selection_config={
"n_samples": 50,
"strategies": [
{"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
],
},
)
print(scheduled_run_id)
# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
scheduled_run_id=scheduled_run_id
):
print(
f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
)
if run_info.ended_successfully():
print("SUCCESS")
else:
print("FAILURE")
You can also download here a the full example. By running the script, you will create a dataset, link it to the datasources, and schedule and monitor a selection based on the diversity of images in the embedding space.
If your scheduled run is not picked up by a Lightly Worker head to the FAQ to debug.
Updated about 2 months ago