Crop Selection

Lightly does not only work on full images but also on object crops. This workflow is especially useful for datasets containing small objects or multiple objects in each image and provides the following benefits:

  • Analyze a dataset based on individual object crops
  • Find a diverse set of crops in the dataset
  • Find images that contain objects of interest
  • Full control over the type of objects to process
  • Ignore uninteresting background regions in images
  • Automatic cropping of objects from the original image

📘

Requires Lightly Worker v2.2

Note that the crop selection features require a minimum Lightly Worker of version 2.2.

Prerequisites

To use crop selection with Lightly, you will need the following things:

Predictions

Lightly needs to know which objects to process. This information is provided by uploading a set of predictions to the datasource (see Work with Predictions]). Alternatively, you can use the Lightly pretagging model to generate predictions on the fly. The following prediction task types are supported:

  • object-detection
  • keypoint-detection

Crop Selection with Custom Predictions

Once everything is set up as described above, you can run crop selection by specifying an object diversity strategy. The strategy must use the task name you set for your predictions. For example, if you uploaded predictions to .lightly/predictions/vehicles_object_detections, then you should set task="vehicles_object_detection" in the selection strategy.

You can additionally train the embedding model on the object crops to improve the embedding and selection quality. To enable training, set enable_training=True and training.task_name="vehicles_object_detection" in the worker config. See configuration options for more details and all available options.

📘

Requires Lightly Worker v2.6

Training on object crops using the training.task_name option is new since Lightly Worker v2.6. This option was called object_level.task_name in previous versions. The old option is no longer available since v2.6.

The example below shows a full configuration for object diversity selection including training:

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Set up your datasources
client.set_local_config(
    relative_path="input",
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="lightly",
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule the docker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

📘

Set Up an Object Typicality Selection Strategy: Requires Lightly Worker v2.10

In order to select the crops with highest typicality instead, simply edit the selection configuration as follows:

selection_config={
    "n_samples": 100,
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "vehicles_object_detections", # change to your task
            },
            "strategy": {
                "type": "TYPICALITY",
            }
        },
    ]
}

Crop Selection using Lightly Pretagging

Instead of providing your own predictions, using the built-in pretagging model from Lightly is also possible. To do so, set pretagging=True in your worker config and use "lightly_pretagging" as the task name. For more information about the prediction model and classes, visit Lightly Pretagging.

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the Lightly Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging": True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/output/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the Lightly Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/output/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the Lightly Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)
import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/output/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the Lightly Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Set up your datasources
client.set_local_config(
    relative_path="input",
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="lightly",
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule the Lightly Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

 Crop Datasets

After selection, the Lightly Worker uploads the selected images and object crops to the Lightly Platform. See our page on Crop Datasets for more details on the uploaded data.