Security and data privacy is very important to us at Lightly. In this section you find all security related information. Legal documents such as Privacy Notice, T&C, and DPA are available under

Architecture Overview

The following gives you an overview of the Lightly cloud architecture and how Lightly integrates with your data.

A few important things to note:

  • Data storage and processing occur only within the client's cloud infrastructure.
  • Lightly only needs permission to list files within your cloud storage (AWS S3, Google Cloud Storage, Azure) and to create signed URLs
  • All assets Lightly creates with your data, such as images, videos, sequences, frames, objects, or thumbnails, are always stored within the client's cloud storage. Any additional data such as metadata, predictions, embeddings, checkpoints, or any other non-sensitive data used to manage the datasets are stored in secured databases within Lightly's cloud infrastructure.
  • Authentication is provided through our partner Auth0. Additional services such as 2FA/MFA and SAML can be added upon request.

Lightly Cloud Architecture.

How does your Data Flow Around?

We differentiate between usage data and the actual raw unlabeled data stored in your cloud storage. Samples can be images or videos and their subtypes, such as sequences, frames, thumbnails, or object crops. Samples typically contain sensitive information (PII). We set up the whole Lightly cloud architecture in a way that you can fully restrict sensitive data from leaving your cloud environment (see AWS S3).

Whenever you process new data using Lightly, the following steps happen:

  1. You create a new run using the Lightly Python API. The run contains information about the location of the data (bucket path to AWS S3, Google Cloud Storage, or Azure) as well as the parameters of how the data should be processed.
  2. After the run has been created, the Lightly Worker can process it. The Lightly Worker typically runs on a GPU instance within your own cloud environment. It uses the run information to load the data directly from your cloud bucket securely by using signed URLs created by the Lightly Platform.
  3. At the end of the run, the Lightly Worker pushes part of the results from the selection to the Lightly Platform. The Lightly Platform only receives non-sensitive information such as which filenames have been selected, the embeddings, the metadata, and predictions. Other assets created of the results by the Lightly Worker, such as thumbnails and frames that could contain sensitive information, are stored in your cloud storage.

This setup has several advantages:

  • The large amount of data that could contain thousands of videos or millions of frames is only moved around within your cloud infrastructure when needed. If the data in your cloud storage is in the same region as your instance, no egress traffic cost occurs, and the latency is very low for fast processing.
  • Lightly never stores sensitive information, so you don’t have to worry about this.
  • This setup allows for additional hardening of the access rules/permission policy as the Lightly Cloud does not need to read the actual data in your bucket (see more restrictive policies with AWS S3).

Where is your Data Stored?

Data stored within the Lightly Platform:

  • Embeddings - contain the filenames of the sample and a vector representation describing it
  • Metadata - any metadata provided to the data selection workflow is cached for faster access and visualization in the user interface.
  • Predictions - similar to metadata, predictions are cached as well.

Data stored within your own cloud infrastructure:

  • Samples - the actual images and videos.
  • extracted crops, frames, and thumbnails.


Cached Data

Lightly does caching of the predictions and metadata for faster retrieval. However, the input images and thumbnails or frames from videos are always fetched directly from your connected cloud storage. They are never cached in the Lightly Platform!