Active Learning

Lightly makes use of active learning scores to select the samples which will yield the biggest improvements of your machine learning model. The scores are calculated on-the-fly based on model predictions and provide the selection algorithm with feedback about the uncertainty of the model for the given sample.

Learn more about the concept of active learning scores: Scorers.

Note

Note that the active learning features require a minimum Lightly Worker of version 2.2. You can check your installed version of the Lightly Worker by running the Sanity Check.

Prerequisites

In order to do active learning with Lightly, you will need the following things:

Note

The dataset does not need to be new! For example, an initial selection without active learning can be used to train a model. The predictions from this model can then be used to improve your dataset by adding new images to it through active learning.

Selection

Once you have everything set up as described above, you can do an active learning iteration by specifying the following three things in your Lightly Worker config:

  • method

  • active_learning.task_name

  • active_learning.score_name

Here’s an example of how to configure an active learning run:

import json
import lightly
from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset('pedestrian-videos-datapool',
                      dataset_type=DatasetType.VIDEOS)

# Pick one of the following three blocks depending on where your data is
# AWS S3
# Input bucket
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)


# or Google Cloud Storage
# Input bucket
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)


# or Azure Blob Storage
# Input bucket
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with 
#  - "active_learning.task_name" set to your task name
#  - "method" set to "coral"
# All other settings are default values and we show them so you can easily edit
# the values according to your need.
client.schedule_compute_worker_run(
    worker_config={
        "enable_corruptness_check": True,
        "remove_exact_duplicates": True,
        "enable_training": False,
        "pretagging": False,
        "pretagging_debug": False,
        "method": "coral",  # we use the coral method here
        "stopping_condition": {
          "n_samples": 0.1,
          "min_distance": -1
        },
        "active_learning": { # here we specify our active learning parameters
          "task_name": "my-classification-task",    # set the task
          "score_name": "uncertainty_margin"        # set the score
        }
    },
    lightly_config={
        'loader': {
            'batch_size': 16,
            'shuffle': True,
            'num_workers': -1,
            'drop_last': True
        },
        'model': {
            'name': 'resnet-18',
            'out_dim': 128,
            'num_ftrs': 32,
            'width': 1
        },
        'trainer': {
            'gpus': 1,
            'max_epochs': 100,
            'precision': 32
        },
        'criterion': {
            'temperature': 0.5
        },
        'optimizer': {
            'lr': 1,
            'weight_decay': 0.00001
        },
        'collate': {
            'input_size': 64,
            'cj_prob': 0.8,
            'cj_bright': 0.7,
            'cj_contrast': 0.7,
            'cj_sat': 0.7,
            'cj_hue': 0.2,
            'min_scale': 0.15,
            'random_gray_scale': 0.2,
            'gaussian_blur': 0.5,
            'kernel_size': 0.1,
            'vf_prob': 0,
            'hf_prob': 0.5,
            'rr_prob': 0
        }
    }
)

After running the code we have to make sure we have a running Lightly Worker to process the job. We can start a Lightly Worker using the following command

docker run --rm --gpus all -it \
  -v /docker-output:/home/output_dir lightly/worker:latest \
  token=YOUR_TOKEN  worker.worker_id=YOUR_WORKER_ID

After the worker has finished its job you can see the selected images with their active learning score in the web-app.