Run Your First Selection

Scheduling a Run

Now that everything is in place, let’s configure a run:

from lightly.api import ApiWorkflowClient

# Create a client with your token and configure it to use your dataset ID.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

# Configure and schedule a run.
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)
print(f"scheduled_run_id: {scheduled_run_id}")

The selection_config will make the LightlyOne Worker choose 50 samples from the initial dataset that are as diverse as possible. This is done using the embeddings, which are automatically created during the run. There's a plethora of options for both, the worker and the selection config but we'll keep it simple for the first run.

🚧
Retention Policy
Data processed by the LightlyOne Worker must be available during the whole run duration. However, if you use a cloud bucket datasource with a retention or lifecycle policy, data might get deleted during a run. If your bucket has a retention policy, please have a look at our retention policy docs on how to configure the LightlyOne Worker for this scenario.

Monitoring a Run

The LightlyOne Worker will pick up the run and start working on it within a few seconds. The status of the current run can be monitored from Python:

from lightly.api import ApiWorkflowClient


# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# You can use this code to track and print the state of the LightlyOne Worker.
# The loop will end once the run has finished, was canceled, or failed.
print(scheduled_run_id)
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"LightlyOne Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You should see an output similar to this one:

63722b94c5def52068655308
LightlyOne Worker run is now in state='INITIALIZING' with message='State set to INITIALIZING'
LightlyOne Worker run is now in state='CHECKING_CORRUPTNESS' with message='State set to CHECKING_CORRUPTNESS'
LightlyOne Worker run is now in state='EMBEDDING' with message='State set to EMBEDDING'
LightlyOne Worker run is now in state='SAMPLING' with message='State set to SAMPLING'
LightlyOne Worker run is now in state='GENERATING_REPORT' with message='State set to GENERATING_REPORT'
LightlyOne Worker run is now in state='COMPLETED' with message='State set to COMPLETED'
SUCCESS

Putting It All Together

We recommend putting all the pieces together in a single file. When processing new data you would execute the Python script to create a job and then spin up the LightlyOne Worker to process it.

'''
All-in-one script for cloud storage (S3)

This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with LightlyOne from scratch. 

By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''

# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)


# Configure the input datasource. You need to adapt this step according to your datasource
# and  credentials. See https://docs.lightly.ai/docs/set-up-your-first-dataset for more information.
# In this tutorial S3 is used.
client.set_s3_config(
    resource_path="s3://bucket/input/project_A/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/project_A/",
    region="eu-central-1",
    access_key="S3-ACCESS-KEY",
    secret_access_key="S3-SECRET-ACCESS-KEY",
    purpose=DatasourcePurpose.LIGHTLY
)

# Configure and schedule a run following the selection strategy in 
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"LightlyOne Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

'''
All-in-one script for local storage

This script merges the code blocks from https://docs.lightly.ai/docs/set-up-your-first-dataset
and https://docs.lightly.ai/docs/run-your-first-selection in a way that allows you to do your first
run with Lightly from scratch. 

By running the script you will create a dataset, link it to the datasources, schedule and monitor
a selection based on diversity of images into the embedding space.
'''

# Imports
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset(
    dataset_name="dataset-name",
    dataset_type=DatasetType.IMAGES  # can be DatasetType.VIDEOS when working with videos
)

# This config depends on the input_mount and lightly_mount folder.
# Make sure you mounted them when starting the LightlyOne Worker. 
# See here: https://docs.lightly.ai/docs/install-lightly#local-storage
client.set_local_config( 
    purpose=DatasourcePurpose.INPUT,
  	#relative_path="",  # Optional: relative path to the input_mount folder. 
)
client.set_local_config(
    purpose=DatasourcePurpose.LIGHTLY,
  	#relative_path="",  # Optional: relative path in the lightlty_mount folder.
)

# Configure and schedule a run following the selection strategy in 
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

# Monitor your run from Python
for run_info in client.compute_worker_run_info_generator(
    scheduled_run_id=scheduled_run_id
):
    print(
        f"LightlyOne Worker run is now in state='{run_info.state}' with message='{run_info.message}'"
    )

if run_info.ended_successfully():
    print("SUCCESS")
else:
    print("FAILURE")

You can also download the full example here. By running the script, you will create a dataset, link it to the datasources, and schedule and monitor a selection based on the diversity of images in the embedding space.
If your scheduled run is not picked up by a LightlyOne Worker head to the FAQ to debug.

Checking if a run was successful / Debugging

You can see the state of a run at 2 places:

Recommended: In the LightlyOne Platform by clicking on the run.
For automation: From Python by following the example earlier on this page: Monitoring a run.

At all places, you can see the current state of the run and if it succeeded or failed. If it fails, check the error message, it often tells you directly what to fix and how. For more details of the run and what happened, make sure to check the log.txt file created after every run. You can get it easily in the LightlyOne Platform by clicking on the run. You can also get the log.txt from the Lightly Python Client, see here.

For more details, have a look at our debugging docs.

All-in-One Jupyter Notebook for Running LightlyOne

There is also a self-contained all-in-one jupyter notebook for all installations and running LightlyOne, in case you want to directly run LightlyOne again without going through the explanations on this and the last pages. It covers these steps:

Installing Docker, the LightlyOne Worker, and the Lightly Python Client.
Downloading a dataset.
Scheduling a run on the dataset and processing it with the LightlyOne Worker.

View Processed Dataset and Analyze Selection Results

After the scheduled run is fully processed, you can view your dataset in the LightlyOne Platform and analyze the selection. Just follow the guide on the next page.