What is Lightly?
A machine learning model can only be as good as the data it is trained on. And figuring out what the best data is can be very time consuming and expensive. With Lightly, you can automate data curation processes and process millions of images or thousands of videos every day.
Automatically Select Data that Matters
Using Lightly, you can do Active Learning at scale. You can use inputs such as embeddings, metadata and model predictions to select the most valuable subset you want to use for labeling and model training.
By combining the three inputs, you can build your active learning strategies to find, for example:
- images that are potential outliers or out of distribution based on embeddings
- balancing the selected data based on locations and weather conditions provided as metadata
- crowded scenes where the model predictions have low confidence
How does Lightly Work?
The Lightly solution is an intelligent system designed to process raw, unlabeled image data, select the most informative samples for labeling, and mitigate dataset bias. Lightly scales to big datasets with millions of images or thousands of videos. Processing these large datasets requires a special architecture.
You can use Lightly in just four simple steps:
- Create an account on our Lightly Platform
- Create a new data curation job using our Python SDK
- Spin up our Lightly Worker docker container to process the new job
- Enjoy the curated dataset either using our easy to use API or in the Platform
Following, you will see a brief overview of the architecture.
Cloud Storage - Datasource
A datasource provides access to data stored in your cloud storage. Currently, we support AWS S3 buckets, Google cloud buckets, and Azure blob storage accounts as datasources. The solution will access data directly in your cloud storage and stream it from there. There is no need to download the data manually.
The Lightly Worker is a Docker container designed to process large datasets. You host it yourself on a machine of your choice. The Lightly Worker processes runs from a run queue and stores the outputs back to your cloud storage.
The Lightly Platform is used for the orchestration of workflows and analytics. It keeps track of the state of your dataset, allows sharing datasets with co-workers or labeling partners, and much more. You need to create an account for the Lightly Platform to use Lightly.
Lightly Python Client
Use the Lightly Python client to send commands to the Lightly Platform and workers. You can schedule runs directly from your Python code. This allows complete control over the process, easy reproducibility, and automates your data selection pipeline.
See how to setup Lightly on your machine in our Getting Started section!
Updated 4 months ago