A machine learning model can only be as good as the data it is trained on. Figuring out what the best data is can be very time consuming and expensive. With Lightly, you can automate data curation processes and process millions of images or thousands of videos every day.
With Lightly, you can do Active Learning at scale. You can use inputs such as embeddings, metadata and model predictions to select the most valuable subset you want to use for labeling and model training.
By combining the three inputs, you can build your active learning strategies to find, for example:
- images that are potential outliers or out of distribution based on embeddings
- balancing the selected data based on locations and weather conditions provided as metadata
- crowded scenes where the model predictions have low confidence
The Lightly solution is an intelligent system designed to process raw, unlabeled image data, select the most informative samples for labeling, and mitigate dataset bias. Lightly scales to big datasets with millions of images or thousands of videos. Processing these large datasets requires a special architecture.
You can use Lightly in just four simple steps:
- Create an account on our Lightly Platform
- Create a new data curation job using our Python SDK
- Spin up our Lightly Worker docker container to process the new job
- Enjoy the curated dataset either using our easy to use API or in the Platform
Following, you will see a brief overview of the architecture.
A datasource provides Lightly with access to the data. Currently, Lightly supports the following types of datasources:
- AWS Simple Cloud Storage (S3)
- Google Cloud Storage (GCS)
- Azure blob storage
- Local storage (Local drives, NFS, CIFS/SMB)
The solution will access data directly in your datasource and stream it from there.
The Lightly Worker is a Docker container designed to process large datasets. You host it yourself on a machine of your choice. The Lightly Worker processes runs from a run queue and stores the outputs back to your cloud storage.
The Lightly Platform is used for the orchestration of workflows and analytics. It keeps track of the state of your dataset, allows sharing datasets with co-workers or labeling partners, and much more. You need to create an account for the Lightly Platform to use Lightly.
Use the Lightly Python client to send commands to the Lightly Platform and workers. You can schedule runs directly from your Python code. This allows complete control over the process, easy reproducibility, and automates your data selection pipeline.
See how to setup Lightly on your machine in our Getting Started section!
Updated 15 days ago