Lightly allows you to configure a remote datasource like Amazon S3 (Amazon Simple Storage Service). This guide will show you how to set up your S3 bucket and configure your dataset to use the said bucket.

List, Read, and Write Permissions

Lightly needs to have read, list, and write permissions (s3:GetObject, s3:ListBucket, and s3:PutObject) on your bucket.

User Access and Delegated Access

There are two ways to set up the aforementioned permissions:

User Access
This method will create a user with permissions to access your bucket. An Access key ID and Secret access key allow you to authenticate as this user. We recommend this method as it is easy to set up and provides optimal performance. See User Access on how to set up access via user access.

Delegated Access
To access your data in your S3 bucket on AWS, Lightly can assume a role in your account, which has the necessary permissions to access your data. Use this method if internal or external policies of your organization require it or disallow the user access method. It comes with a small overhead for each access to a file in your bucket by Lightly. The overhead is negligible for larger files (e.g., videos or large images) but may become significant for many small files. See Delegated Access on how to set up access via delegated access.

Setup Access Policies

User Access

  1. Go to the Identity and Access Management IAM page and create a new user for Lightly.
  2. Choose a unique name and select “Programmatic access” as “Access type”.
19201920

Create AWS User.

  1. We will want to create very restrictive permissions for this new user so that it can’t access other resources of your company. Click on “Attach existing policies directly” and then on “Create policy”. This will bring you to a new page.
19201920

Setting user permission in AWS.

  1. As our policy is straightforward, we will use the JSON option and enter the following. Please substitute datalake with the name of your bucket and projects/farm-animals/ with the folder you want to share.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowListing",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": [
                "arn:aws:s3:::datalake",
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ]
        },
        {
            "Sid": "AllowAccess",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ]
        }
    ]
}
19201920

Permission policy in AWS.

  1. Go to the next page and create tags as you see fit (e.g., external or lightly) and give a name to your new policy before creating it.
19201920

Review and name permission policy in AWS.

  1. Return to the previous page, as shown in the screenshot below, and reload. Now when filtering policies, your newly created policy will show up. Select it and continue setting up your new user.
19201920

Attach permission policy to user in AWS.

  1. Write down the Access key ID and the Secret access key in a secure location (such as a password manager), as you will not be able to reaccess this information (you can generate new keys and revoke old keys under Security credentials of a users detail page).
19201920

Get security credentials (access key id, secret access key) from AWS.

Delegated Access

  1. Go to the AWS IAM Console.
  2. Click "Create role".
  3. Select "AWS Account" as the trusted entity type
    3.1. Select "Another AWS" account and specify the AWS Account ID of Lightly: 916419735646
    3.2. Check "Require external ID", and choose an external ID. The external ID should be treated like a passphrase
    3.3. Do not check "Require MFA"
    3.4. Click next
  4. Select a policy that grants access to your S3 bucket. If no policy has previously been created, here is an example of how the policy should look like. Please substitute YOUR_BUCKET with the name of your bucket.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "lightlyS3Access",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::{YOUR_BUCKET}/*",
                "arn:aws:s3:::{YOUR_BUCKET}"
            ]
        }
    ]
}
  1. Name the role Lightly-S3-Integration and create the role.
  2. Remember the external ID and the ARN of the newly created role (arn:aws:iam::123456789012:role/Lightly-S3-Integration)

Configure a Datasource

That's it! Head over to Datasources to see how you can configure Lightly to access your data while following the notes below:

📘

Note

If you are using a delegated access role, toggle the switch "Use IAM role-based delegated access" and pass the external ID and the role ARN from the previous step instead of the secret access key.

📘

Note

If you want to use server side encryption, toggle the switch "Use server-side encryption" and set the KMS key arn. (see: S3 Server Side Encryption with KMS)

Advanced Use Cases

Server-Side Encryption with KMS

It's possible to enable server-side encryption with a KMS key as outlined by the official documentation of AWS.

Create the KMS Key

  1. Go to the Key Management Service KMS page and create a new KMS key for the bucket.
  2. Choose a unique name of your choice and select "Symmetric" and "Encrypt and decrypt". Click next.
  3. On the define key usage permissions step 4, ensure that the IAM user or role configured to be used with the datasource in the Lightly Worker is selected. Click next and create the key.
  4. After creation, you can click on the key and copy the KMS key arn.

📘

Note

The IAM user or role which is configured to be used with the datasource in the Lightly Worker will additionally need the following AWS KMS permissions: kms:Encrypt, kms:Decrypt and kms:GenerateDataKey.

Using the KMS Key

When setting up an S3 datasource in Lightly, you can set the KMS key arn. In that case, the LIGHTLY_S3_SSE_KMS_KEY environment variable will be set, which will add the following headers x-amz-server-side-encryption and x-amz-server-side-encryption-aws-kms-key-id to all requests (PutObject) of the artifacts Lightly creates (like crops, frames, thumbnails) as outlined by the official documentation of AWS.

More Restrictive Policies

It is possible to make your access policy very restrictive and even to deny anyone with the correct IAM user credentials or role from outside, e.g., your VPC or a specific IP range, from reading your data.

The only hard requirement Lightly requires to work correctly is S3:ListBucket. With this permission, Lightly will only be able to list the filenames within your bucket but can’t access the contents of your data. Only you will be able to access your data’s content.

🚧

Warning

The Lightly Worker will need to be running within the permissioned zone that allows S3:GetObject (e.g., within your VPC or IP range) and will need the configuration flag datasource.bypass_verify set to True in the worker configuration.

Important: When restricting S3:GetObject, it will no longer be possible to use the relevant filenames feature.

📘

Note

If you later want to use the Lightly Platform to visualize your data (e.g., see the images in the embedding view), you will also need to whitelist the IPs from where you are planning to access it (e.g., the IP of your ISP at your office or the IP of your VPN).

Restrict IP-Range

The following example restricts access to your bucket datalake so that only services from the IP range 21.21.21.x are allowed to access your data (see “Sid”: “RestrictIP”). For Lightly to work correctly, we allow s3:ListBucket (see “Sid”: “AllowLightly”).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RestrictIP",
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::datalake",
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "21.21.21.0/24"
                    ]
                }
            }
        },
        {
            "Sid": "AllowLightly",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": [
                "arn:aws:s3:::datalake",
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ]
        }
    ]
}

Restrict VPC

It is possible to restrict access to a specific VPC by specifying a string condition.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RestrictVPC",
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::datalake",
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:SourceVpc": "vpc-111bbb22"
                }
            }
        },
        {
            "Sid": "AllowLightly",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": [
                "arn:aws:s3:::datalake",
                "arn:aws:s3:::datalake/projects/farm-animals/*"
            ]
        }
    ]
}

Further Restrictions

There are different ways of expressing the logic of restricting access to your resources. You can DENY access to specific permissions or invert the permission with NotAction. There are also further conditional operators and string conditions to be more explicit.