Crop Datasets

When Lightly is used with Object Diversity selection strategy, two datasets are uploaded to the Lightly Platform:

  • The original dataset as usual contains the selected images.

  • Additionally, crops and embeddings from the selected images are uploaded to an object crop dataset on the platform. By default, the dataset has the same name as the original image dataset but with a -crops-{task name} suffix appended to it.

You can see example images of the two datasets below.


Dataset of the original images.


Dataset of object crops.


Crop datasets only support a single prediction task. When multiple selection strategies with different prediction tasks are used, the crops shown on the Lightly Platform might be incorrect.


The crop dataset allows you to analyze your data on an object level. In the vehicles dataset, you could, for example, be interested in the diversity of the vehicles. The embedding view in the object crop dataset shows that the crops have been roughly grouped by vehicle type.







This can be a very efficient way to get insights into your data without the need for human annotations. The embedding view allows you to dig deeper into the properties of your dataset and reveal things like:

  • Q: What sort of special trucks are there?
    A: There are a lot of ambulances and school buses.
  • Q: Are there also vans in the dataset?
    A: There are only a few of them, we should try to get more images containing vans.
  • Q: Are there images of cars in different weather conditions?
    A: Most images appear to be taken in sunny weather with good lighting conditions.

These hidden biases are hard to find in a dataset if you only rely on full images or the coarse vehicle type predicted by the object detection model. Lightly helps you to identify them quickly and assists you in monitoring and improving the quality of your dataset.