Object Level
Warning
The Docker Archive documentation is deprecated
The old workflow described in these docs will not be supported with new Lightly Worker versions above 2.6. Please switch to our new documentation page instead.
Lightly does not only work on full images but also on an object level. This workflow is especially useful for datasets containing small objects or multiple objects in each image and provides the following benefits over the full image workflow:
Analyze a dataset based on individual objects
Find a diverse set of objects in the dataset
Find images that contain objects of interest
Full control over type of objects to process
Ignore uninteresting background regions in images
Automatic cropping of objects from the original image
Note
Note that the object level features require a minimum Lightly Worker of version 2.2. You can check your installed version of the Lightly Worker by running the Sanity Check.
Prerequisites
In order to use the object level workflow with Lightly, you will need the following things:
The installed Lightly docker (see Setup)
A dataset with a configured datasource (see Process new data in your bucket using a datapool)
Object detection predictions uploaded to the datasource (see next section)
Note
If you don’t have any predictions available, you can use the Lightly pretagging model. See Pretagging for more information.
Predictions
Lightly needs to know which objects to process. This information is provided by uploading a set of object predictions to the datasource (see Add Predictions to a Datasource). Let’s say we are working with a dataset containing different types of vehicles and used an object detection model to find possible vehicle objects in the dataset. Then the file structure of the datasource should look like this:
datasource/vehicles_dataset/
+ .lightly/predictions/
+ tasks.json
+ vehicles_object_detections/
+ schema.json
+ image_1.json
...
+ image_N.json
+ image_1.png
+ image_2.png
...
+ image_N.png
The following files should be added to the .lightly/predictions directory in the datasource:
A tasks.json file that contains the name of the subdirectory in which the prediction files are stored.
[ "vehicles_object_detections" ]
A schema.json file that specifies that the predictions are from an object-detection task and a list of all possible object categories.
{ "task_type": "object-detection", "categories": [ { "id": 0, "name": "car", }, { "id": 1, "name": "truck", }, { "id": 2, "name": "motorbike", } ] }
And for each image, or video frame, in the dataset an IMAGE_NAME.json file which holds the predictions the object detection model made for the given image:
{ "file_name": "image_1.png", "predictions": [ { "category_id": 1, "bbox": [...], "score": 0.8 }, { "category_id": 0, "bbox": [...], "score": 0.9 }, { "category_id": 2, "bbox": [...], "score": 0.5 } ] }
For more information regarding the predictions format please see Add Predictions to a Datasource.
Selection on Object Level
Once you have everything set up as described above, you can run selection on object level by setting the object_level.task_name argument in the docker configuration. The argument should be set to the task name you used for your predictions. If you uploaded the predictions to e.g. .lightly/predictions/vehicles_object_detections then you should set object_level.task_name to vehicles_object_detections.
The object level job can either be scheduled from the Lightly Web App or from python code. Examples on how to schedule the job are provided below.
Trigger the Job
To trigger a new job you can click on the schedule run button on the dataset overview as shown in the screenshot below:
After clicking on the button you will see a wizard to configure the parameters for the job.
In this example we have to set the object_level.task_name parameter in the docker config, all other settings are default values. The resulting docker config should look like this:
{
object_level: {
task_name: 'vehicles_object_detections'
},
enable_corruptness_check: true,
remove_exact_duplicates: true,
enable_training: false,
pretagging: false,
pretagging_debug: false,
method: 'coreset',
stopping_condition: {
n_samples: 0.1,
min_distance: -1
},
scorer: 'object-frequency',
scorer_config: {
frequency_penalty: 0.25,
min_score: 0.9
},
active_learning: {
task_name: '',
score_name: 'uncertainty_margin'
}
}
The Lightly config remains unchanged.
import lightly
# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="LIGHTLY_TOKEN", dataset_id="DATASET_ID")
# Schedule the docker run with the "object_level.task_name" argument set.
# All other settings are default values and we show them so you can easily edit
# the values according to your need.
client.schedule_compute_worker_run(
worker_config={
"object_level": {"task_name": "vehicles_object_detections"},
"enable_corruptness_check": True,
"remove_exact_duplicates": True,
"enable_training": False,
"pretagging": False,
"pretagging_debug": False,
"method": "coreset",
"stopping_condition": {"n_samples": 0.1, "min_distance": -1},
"scorer": "object-frequency",
"scorer_config": {"frequency_penalty": 0.25, "min_score": 0.9},
"active_learning": {"task_name": "", "score_name": "uncertainty_margin"},
},
lightly_config={
"loader": {
"batch_size": 16,
"shuffle": True,
"num_workers": -1,
"drop_last": True,
},
"model": {"name": "resnet-18", "out_dim": 128, "num_ftrs": 32, "width": 1},
"trainer": {"gpus": 1, "max_epochs": 100, "precision": 32},
"criterion": {"temperature": 0.5},
"optimizer": {"lr": 1, "weight_decay": 0.00001},
"collate": {
"input_size": 64,
"cj_prob": 0.8,
"cj_bright": 0.7,
"cj_contrast": 0.7,
"cj_sat": 0.7,
"cj_hue": 0.2,
"min_scale": 0.15,
"random_gray_scale": 0.2,
"gaussian_blur": 0.5,
"kernel_size": 0.1,
"vf_prob": 0,
"hf_prob": 0.5,
"rr_prob": 0,
},
},
)
Lightly Pretagging
Instead of providing your own predictions, it’s also possible to use the built-in pretagging model from Lightly. To do so, set pretagging=True in your config and use the object_level.task_name=”lightly_pretagging”. For more information about the prediction model and classes, go to Lightly Pretagging Model
{
object_level: {
task_name: 'lightly_pretagging'
},
enable_corruptness_check: true,
remove_exact_duplicates: true,
enable_training: false,
pretagging: true,
pretagging_debug: false,
method: 'coreset',
stopping_condition: {
n_samples: 0.1,
min_distance: -1
},
scorer: 'object-frequency',
scorer_config: {
frequency_penalty: 0.25,
min_score: 0.9
},
active_learning: {
task_name: '',
score_name: 'uncertainty_margin'
}
}
The Lightly config remains unchanged.
import lightly
# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="LIGHTLY_TOKEN", dataset_id="DATASET_ID")
# Schedule the docker run with the "object_level.task_name" argument set to
# "lightly_pretagging" and with "pretagging" set to True.
# All other settings are default values and we show them so you can easily edit
# the values according to your need.
client.schedule_compute_worker_run(
worker_config={
"object_level": {"task_name": "lightly_pretagging"},
"enable_corruptness_check": True,
"remove_exact_duplicates": True,
"enable_training": False,
"pretagging": True,
"pretagging_debug": False,
"method": "coreset",
"stopping_condition": {"n_samples": 0.1, "min_distance": -1},
"scorer": "object-frequency",
"scorer_config": {"frequency_penalty": 0.25, "min_score": 0.9},
"active_learning": {"task_name": "", "score_name": "uncertainty_margin"},
},
lightly_config={
"loader": {
"batch_size": 16,
"shuffle": True,
"num_workers": -1,
"drop_last": True,
},
"model": {"name": "resnet-18", "out_dim": 128, "num_ftrs": 32, "width": 1},
"trainer": {"gpus": 1, "max_epochs": 100, "precision": 32},
"criterion": {"temperature": 0.5},
"optimizer": {"lr": 1, "weight_decay": 0.00001},
"collate": {
"input_size": 64,
"cj_prob": 0.8,
"cj_bright": 0.7,
"cj_contrast": 0.7,
"cj_sat": 0.7,
"cj_hue": 0.2,
"min_scale": 0.15,
"random_gray_scale": 0.2,
"gaussian_blur": 0.5,
"kernel_size": 0.1,
"vf_prob": 0,
"hf_prob": 0.5,
"rr_prob": 0,
},
},
)
Padding
Lightly makes it possible to add a padding around your bounding boxes. This allows for better visualization of the cropped images in the web-app and can improve the embeddings of the objects as the embedding model sees the objects in context. To add padding, simply specify object_level.padding=X where X is the padding relative to the bounding box size. For example, a padding of X=0.1 will extend both width and height of all bounding boxes by 10 percent.
Object Crops Dataset
Once the docker job is started it fetches all images and predictions from the remote datasource and processes them. For each prediction, the docker crops the object from the full image and creates an embedding for it. Then it selects a subset of the objects and uploads two datasets to the Lightly Platform:
The crops and embeddings of the selected objects are uploaded to an object crops dataset on the platform. By default, the dataset has the same name as the original image dataset but with a “-crops” suffix appended to it. Alternatively, you can also choose a custom dataset name by setting the object_level.crop_dataset_name config option.
If an object is selected, then the full image containing that object is also uploaded. You can find these images in the original dataset from which you started the selection job.
You can see example images of the two datasets below.
Object Crop Dataset:
Original Full Image Dataset:
Analyzing the Crop Dataset
The crop dataset allows you to analyze your data on an object level. In our vehicles dataset we could, for example, be interested in the diversity of the vehicles. If we go to our crops dataset and select the Embedding view in the menu, we can see that crops are roughly grouped by vehicle type:
Cars:
Trucks:
Motorbikes:
This can be a very efficient way to get insights into your data without the need for human annotations. The embedding view allows you dig deeper into the properties of your dataset and reveal things like:
Q: What sort of special trucks do we have? A: There are a lot of ambulances and school buses.
Q: Are there also vans in the dataset? A: There are only few of them, we should try to get more images containing vans.
Q: Are there images of cars in different weather conditions? A: Most images seem to be taken in sunny weather with good lightning conditions.
These hidden biases are hard to find in a dataset if you only rely on full images or the coarse vehicle type predicted by the object detection model. Lightly helps you to identify them quickly and assists you in monitoring and improving the quality of your dataset. After an initial exploration you can now take further steps to enhance the dataset using one of the workflows Lightly provides:
Select a subset of your data using our Sampling Algorithms
Select new samples to add to your dataset using Active Learning
Prepare images for labelling by exporting them to LabelStudio
Multiple Object Level Runs
You can run multiple object level workflows using the same dataset. To start a new run, please select your original full image dataset in the Lightly Web App and schedule a new run from there. If you are running the docker from Python or over the API, you have to set the dataset_id configuration option to the id of the original full image dataset. In both cases make sure that the run is not started from the crops dataset as this is not supported!
You can control to which crops dataset the newly selected object crops are uploaded by setting the object_level.crop_dataset_name configuration option. By default this option is not set and if you did not specify it in the first run, you can also omit it in future runs. In this case Lightly will automatically find the existing crops dataset and add the new crops to it. If you want to upload the crops to a new dataset or have set a custom crop dataset name in a previous run, then set the object_level.crop_dataset_name option to a new or existing dataset name, respectively.