Crop Selection

Lightly does not only work on full images but also on object crops. This workflow is especially useful for datasets containing small objects or multiple objects in each image and provides the following benefits:

  • Analyze a dataset based on individual object crops
  • Find a diverse set of crops in the dataset
  • Find images that contain objects of interest
  • Full control over the type of objects to process
  • Ignore uninteresting background regions in images
  • Automatic cropping of objects from the original image

📘

Requires v2.2

Note that the crop selection features require a minimum Lightly Worker of version 2.2.

Prerequisites

In order to use crop selection with Lightly, you will need the following things:

Predictions

Lightly needs to know which objects to process. This information is provided by uploading a set of object detection predictions to the datasource (see Work with Predictions]). Alternatively, you can use the Lightly pretagging model to generate predictions on the fly.

Crop Selection with Custom Predictions

Once everything is set up as described above, you can run crop selection by setting the object_level.task_name argument in the configuration. The argument should be set to the task name you used for your predictions. For example, if you uploaded predictions to .lightly/predictions/vehicles_object_detections, then you should set object_level.task_name="vehicles_object_detections".

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "object_level.task_name" argument set. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": { # used for crop selection
            "task_name": "vehicles_object_detections" 
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "vehicles_object_detections", # change to your task
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "object_level.task_name" argument set. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": { # used for crop selection
            "task_name": "vehicles_object_detections" 
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "vehicles_object_detections", # change to your task
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "object_level.task_name" argument set. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": { # used for object level workflow
            "task_name": "vehicles_object_detections" 
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "vehicles_object_detections", # change to your task
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "object_level.task_name" argument set. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": { # used for object level workflow
            "task_name": "vehicles_object_detections" 
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "vehicles_object_detections", # change to your task
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)

Crop Selection using Lightly Pretagging

Instead of providing your own predictions, using the built-in pretagging model from Lightly is also possible. To do so, set pretagging=True in your config and use the object_level.task_name=”lightly_pretagging”. For more information about the prediction model and classes, go to Lightly Pretagging.

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "pretagging" argument set to True. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": {
             "task_name": "lightly_pretagging",
        },
        "pretagging": True,
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "lightly_pretagging",
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "pretagging" argument set to True. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": {
             "task_name": "lightly_pretagging",
        },
        "pretagging": True,
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "lightly_pretagging",
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "pretagging" argument set to True. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": {
             "task_name": "lightly_pretagging",
        },
        "pretagging": True,
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "lightly_pretagging",
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with the "pretagging" argument set to True. 
client.schedule_compute_worker_run(
    worker_config={
        "object_level": {
             "task_name": "lightly_pretagging",
        },
        "pretagging": True,
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS"
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:
            """
            {
                "input": {
                    "type": "SCORES",
                    "task": "lightly_pretagging",
                    "score": "uncertainty_entropy" # change to your preferred score
                },
                "strategy": {
                    "type": "WEIGHTS"
                }
            }
            """
        ]
    },
)

Padding

Lightly makes it possible to add padding around your bounding boxes. This allows for better visualization of the cropped images in the Lightly Platform and can improve the embeddings of the objects as the embedding model sees the objects in context. To add padding, simply specify object_level.padding=X where X is the padding relative to the bounding box size. For example, a padding of X=0.1 will extend the width and height of all bounding boxes by ten percent.

# when using custom predictions
worker_config={
    "object_level": {
        "task_name": "vehicles_object_detections",
        "padding": 0.1
    },
}

# when using lightly pretagging
worker_config={
    "pretagging": True,
    "object_level": {
        "task_name": "lightly_pretagging",
        "padding": 0.1
    },
}

Object Crops Dataset

Once the Lightly Worker job is started, it fetches all images and predictions from the remote datasource and processes them. For each prediction, the Lightly Worker crops the object from the full image and creates an embedding for it. Then it selects a subset of the objects and uploads two datasets to the Lightly Platform:

  • The crops and embeddings of the selected objects are uploaded to an object crops dataset on the platform. By default, the dataset has the same name as the original image dataset but with a -crops suffix appended to it. Alternatively, you can also choose a custom dataset name by setting the object_level.crop_dataset_name config option.
  • If an object is selected, then the full image containing that object is also uploaded. You can find these images in the original dataset from which you started the selection job.

You can see example images of the two datasets below.

23442344

Dataset of object crops.

23342334

Dataset of the original images.

Insights

The crop dataset allows you to analyze your data on an object level. In the vehicles dataset, you could, for example, be interested in the diversity of the vehicles. The embedding view in the object crop dataset shows that the crops have been roughly grouped by vehicle type.

23482348

Cars

23402340

Trucks

23442344

Motorbikes

This can be a very efficient way to get insights into your data without the need for human annotations. The embedding view allows you to dig deeper into the properties of your dataset and reveal things like:

  • Q: What sort of special trucks are there? A: There are a lot of ambulances and school buses.
  • Q: Are there also vans in the dataset? A: There are only a few of them, we should try to get more images containing vans.
  • Q: Are there images of cars in different weather conditions? A: Most images appear to be taken in sunny weather with good lighting conditions.

These hidden biases are hard to find in a dataset if you only rely on full images or the coarse vehicle type predicted by the object detection model. Lightly helps you to identify them quickly and assists you in monitoring and improving the quality of your dataset.