Run Your First Selection

Scheduling a Run

Now that everything is in place, let’s configure a run:

from lightly.api import ApiWorkflowClient

# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(f"scheduled_run_id: {scheduled_run_id}")

The selection_config will make the Lightly Worker choose 50 samples from the initial dataset that are as diverse as possible. This is done using the embeddings, which are automatically created during the run. There's a plethora of options for both, the worker and the selection config but we'll keep it simple for the first run.

🚧

Retention Policy

Data processed by the Lightly Worker must be available during the whole run duration. However, if you use a cloud bucket datasource with a retention or lifecycle policy, data might get deleted during a run. If your bucket has a retention policy, please have a look at our retention policy docs on how to configure the Lightly Worker for this scenario.

Monitoring a Run

The Lightly Worker will pick up the run and start working on it within a few seconds. The status of the current run can be monitored from Python:

from lightly.api import ApiWorkflowClient


# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# You can use this code to track and print the state of the Lightly Worker.
# The loop will end once the run has finished, was canceled, or failed.
print(scheduled_run_id)
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You should see an output similar to this one:

63722b94c5def52068655308
Lightly Worker run is now in state='INITIALIZING' with message='State set to INITIALIZING'
Lightly Worker run is now in state='CHECKING_CORRUPTNESS' with message='State set to CHECKING_CORRUPTNESS'
Lightly Worker run is now in state='EMBEDDING' with message='State set to EMBEDDING'
Lightly Worker run is now in state='SAMPLING' with message='State set to SAMPLING'
Lightly Worker run is now in state='GENERATING_REPORT' with message='State set to GENERATING_REPORT'
Lightly Worker run is now in state='COMPLETED' with message='State set to COMPLETED'
SUCCESS

Putting It All Together

We recommend putting all the pieces together in a single file. When processing new data you would execute the Python script to create a job and then spin up the Lightly Worker to process it.

'''
All-in-one script for cloud storage (S3)

This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with Lightly from scratch. 

By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''

# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)


# Configure the input datasource. You need to adapt this step according to your datasource
# and  credentials. See https://docs.lightly.ai/docs/set-up-your-first-dataset for more information.
# In this tutorial S3 is used.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.LIGHTLY
)

# Configure and schedule a run following the selection strategy in 
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")
'''
All-in-one script for local storage

This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with Lightly from scratch. 

By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''

# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)

# This config depends on the input_mount and lightly_mount folder.
# Make sure you mounted them when starting the Lightly Worker. 
# See here: https://docs.lightly.ai/docs/install-lightly#local-storage
client.set_local_config( 
    purpose=DatasourcePurpose.INPUT,
  	#relative_path="",  # Optional: relative path to the input_mount folder. 
)
client.set_local_config(
    purpose=DatasourcePurpose.LIGHTLY,
  	#relative_path="",  # Optional: relative path in the lightlty_mount folder.
)

# Configure and schedule a run following the selection strategy in 
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"Lightly Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You can also download the full example here. By running the script, you will create a dataset, link it to the datasources, and schedule and monitor a selection based on the diversity of images in the embedding space.
If your scheduled run is not picked up by a Lightly Worker head to the FAQ to debug.

Checking if a run was successful / Debugging

You can see the state of a run at 2 places:

At all places, you can see the current state of the run and if it succeeded or failed. If it failed, check the error message, it often tells you directly what to fix and how. For more details of the run and what happened, make sure to check the log.txt file created after every run. You can get it easily in the Lightly Platform by clicking on the run. You can also get the log.txt from the python API, see here.

For more details, have a look at our debugging docs.

All-in-One Jupyter Notebook for Running Lightly

There is also a self-contained all-in-one jupyter notebook for all installations and running Lightly, in case you want to directly run the Lightly solution again without going through the explanations on this and the last pages. It covers these steps:

  1. Installing docker, the Lightly Worker, and the Lightly Python SDK.
  2. Downloading a dataset.
  3. Scheduling a run on the dataset and processing it with the Lightly Worker.

View Processed Dataset and Analyze Selection Results

After the scheduled run is fully processed, you can view your dataset in the Lightly Platform and analyze the selection. Just follow the guide on the next page.