Crop Selection

LightlyOne does not only work on full images but also on object crops. This workflow is especially useful for datasets containing small objects or multiple objects in each image and provides the following benefits:

Analyze a dataset based on individual object crops
Find a diverse set of crops in the dataset
Find images that contain objects of interest
Full control over the type of objects to process
Ignore uninteresting background regions in images
Automatic cropping of objects from the original image

📘
Requires LightlyOne Worker v2.2
Note that the crop selection features require a minimum LightlyOne Worker of version 2.2.

Prerequisites

To use crop selection with Lightly, you will need the following things:

The installed LightlyOne Worker (see Install Lightly)
A dataset with a configured datasource (see Set Up Your First Dataset)

Predictions

LightlyOne needs to know which objects to process. This information is provided by uploading a set of predictions to the datasource (see Work with Predictions]). Alternatively, you can use the LightlyOne pretagging model to generate predictions on the fly. The following prediction task types are supported:

object-detection
keypoint-detection

Crop Selection with Custom Predictions

Once everything is set up as described above, you can run crop selection by specifying an object diversity strategy. The strategy must use the task name you set for your predictions. For example, if you uploaded predictions to .lightly/predictions/vehicles_object_detections, then you should set task="vehicles_object_detection" in the selection strategy.

You can additionally train the embedding model on the object crops to improve the embedding and selection quality. To enable training, set enable_training=True and training.task_name="vehicles_object_detection" in the worker config. See configuration options for more details and all available options.

📘
Requires LightlyOne Worker v2.6
Training on object crops using the training.task_name option is new since LightlyOne Worker v2.6. This option was called object_level.task_name in previous versions. The old option is no longer available since v2.6.

The example below shows a full configuration for object diversity selection including training:

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/project_A/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/project_A/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/project_A/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/project_A",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/project_A/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/lightly/project_A",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/project_A/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/lightly/project_A/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)


from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Set up your datasources
client.set_local_config(
    relative_path="input",
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="lightly",
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule the LightlyOne Worker run with an object diversity selection strategy.
client.schedule_compute_worker_run(
    worker_config={
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "vehicles_object_detections" # change to your task
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "vehicles_object_detections", # change to your task
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "vehicles_object_detections", # change to your task
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

📘
Set Up an Object Typicality Selection Strategy: Requires LightlyOne Worker v2.10
In order to additionally select the crops with highest typicality, simply edit the selection configuration as follows:
selection_config={
    "n_samples": 100,
    "strategies": [
        {
            "input": {
                "type": "EMBEDDINGS",
                "task": "vehicles_object_detections", # change to your task
            },
            "strategy": {
                "type": "TYPICALITY",
            }
        },
    ]
}

🚧
You should always combine typicality with diversity as typicality alone can result in selecting images only from a single high density cluster. Furthermore, we strongly discourage using typicality for datasets with more than 100,000 input samples. For large datasets, it not only does not help selection, but also leads to long worker runtimes.

Crop Selection using LightlyOne Pretagging

Instead of providing your own predictions, using the built-in pretagging model from LightlyOne is also possible. To do so, set pretagging=True in your worker config and use "lightly_pretagging" as the task name. For more information about the prediction model and classes, visit LightlyOne Pretagging.

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3 Delegated Access
# Configure the Input datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/input/project_A/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_delegated_access_config(
    resource_path="s3://bucket/lightly/project_A/",
    region="eu-central-1",
    role_arn="S3-ROLE-ARN",
    external_id="S3-EXTERNAL-ID",
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging": True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# AWS S3
# Configure the Input datasource.
client.set_s3_config(
    resource_path="s3://bucket/input/project_A/",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_s3_config(
    resource_path="s3://bucket/lightly/project_A",
    region='eu-central-1',
    access_key='S3-ACCESS-KEY',
    secret_access_key='S3-SECRET-ACCESS-KEY',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Google Cloud Storage
# Configure the Input datasource.
client.set_gcs_config(
    resource_path="gs://bucket/input/project_A/",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_read.json'))),
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_gcs_config(
    resource_path="gs://bucket/lightly/project_A",
    project_id="PROJECT-ID",
    credentials=json.dumps(json.load(open('credentials_write.json'))),
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

import json
import lightly
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Azure Blob Storage
# Configure the Input datasource.
client.set_azure_config(
    container_name='my-container/input/project_A/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.INPUT
)
# Configure the Lightly datasource.
client.set_azure_config(
    container_name='my-container/lightly/project_A/',
    account_name='ACCOUNT-NAME',
    sas_token='SAS-TOKEN',
    purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the LightlyOne Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)


from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose

# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")

# Create a new dataset on the LightlyOne Platform.
client.create_dataset("dataset-name", dataset_type=DatasetType.IMAGES)

# Set up your datasources
client.set_local_config(
    relative_path="input",
    purpose=DatasourcePurpose.INPUT,
)
client.set_local_config(
    relative_path="lightly",
    purpose=DatasourcePurpose.LIGHTLY,
)

# Schedule the LightlyOne Worker run with "pretagging" set to True.
client.schedule_compute_worker_run(
    worker_config={
      	"pretagging" True,
        "enable_training": True, 	# optional, remove to disable training
        "training": { 						# optional, remove to train on the full images
            "task_name": "lightly_pretagging"
        },
    },
    selection_config={
        "n_samples": 100,
        "strategies": [
            {
                "input": {
                    "type": "EMBEDDINGS",
                    "task": "lightly_pretagging",
                },
                "strategy": {
                    "type": "DIVERSITY",
                }
            },
            # Optionally, you can combine diversity selection with active learning
            # to prefer selecting objects the model struggles with.
            # If you want that, just include the following code:

            # {
            #     "input": {
            #         "type": "SCORES",
            #         "task": "lightly_pretagging",
            #         "score": "uncertainty_entropy" # change to your preferred score
            #     },
            #     "strategy": {
            #         "type": "WEIGHTS"
            #     }
            # }
        ]
    },
)

Crop Datasets

After selection, the LightlyOne Worker uploads the selected images and object crops to the LightlyOne Platform. See our page on Crop Datasets for more details on the uploaded data.