Run Your First Selection

Requirements

Scheduling a Run

Now that everything is in place, let’s configure a run:

from lightly.api import ApiWorkflowClient

# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(scheduled_run_id)

The worker_config allows you to configure the Lightly Worker in many ways, see Configuration Options.

The selection_config will make the Lightly Worker choose 50 samples from the initial dataset that are as diverse as possible. This is done using the embeddings, which are automatically created during the run. Additional information, more options, and possibilities regarding the selection config can be found on our Customize a Selection page.

Monitoring a Run

The Lightly Worker will pick up the run and start working on it within a few seconds. The status of the current run and other scheduled runs can be seen in the runs view of the Lightly Platform. Alternatively, you can also monitor it from Python:

from lightly.api import ApiWorkflowClient


# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# You can use this code to track and print the state of the Lightly Worker.
# The loop will end once the run has finished, was canceled, or failed.
print(scheduled_run_id)
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You should see an output similar to this one:

63722b94c5def52068655308
Lightly Worker run is now in state='INITIALIZING' with message='State set to INITIALIZING'
Lightly Worker run is now in state='CHECKING_CORRUPTNESS' with message='State set to CHECKING_CORRUPTNESS'
Lightly Worker run is now in state='EMBEDDING' with message='State set to EMBEDDING'
Lightly Worker run is now in state='SAMPLING' with message='State set to SAMPLING'
Lightly Worker run is now in state='GENERATING_REPORT' with message='State set to GENERATING_REPORT'
Lightly Worker run is now in state='COMPLETED' with message='State set to COMPLETED'
SUCCESS

Putting It All Together

We recommend putting all the pieces together in a single file. When processing new data you would execute the Python script to create a job and then spin up the Lightly Worker to process it.

'''
This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with Lightly from scratch. 

By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''

# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)


# Configure the input datasource. You need to adapt this step according to your datasource
# and  credentials. See https://docs2.lightly.ai/docs/set-up-your-first-dataset for more information.
# In this tutorial S3 is used.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.LIGHTLY
)

# Configure and schedule a run following the selection strategy in 
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(scheduled_run_id)

# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You can also download here a the full example. By running the script, you will create a dataset, link it to the datasources, and schedule and monitor a selection based on the diversity of images in the embedding space.
If your scheduled run is not picked up by a Lightly Worker head to the FAQ to debug.