What is Lightly?

The Lightly solution is an intelligent system designed to process raw, unlabeled image data, select the most informative samples for labeling, and mitigate dataset bias. Lightly scales to big datasets with millions of images or thousands of videos. Processing these large datasets requires a special architecture consisting of the following parts.


A datasource provides access to data stored in your cloud storage. Currently, we support AWS S3 buckets, Google cloud buckets, and Azure blob storage accounts as datasources. The solution will access data directly in your cloud storage and stream it from there. There is no need to download the data manually.

Lightly Worker

The Lightly Worker is a Docker container designed to process large datasets. You host it yourself on a machine of your choice. The Lightly Worker processes runs from a run queue and stores the outputs back to your cloud storage.

Lightly Platform

The Lightly Platform is used for the orchestration of workflows and analytics. It keeps track of the state of your dataset, allows sharing datasets with co-workers or labeling partners, and much more. You need to create an account for the Lightly Platform to use Lightly.

Lightly Python Client

Use the Lightly Python client to send commands to the Lightly Platform and workers. You can schedule runs directly from your Python code. This allows you complete control over the process, easy reproducibility, and automation of your data selection pipeline.

See how to setup Lightly on your machine in our Getting Started section!

What’s Next