Quick Start

This quick start shows how to use the Lightly solution locally to select the 50 most diverse images from a set of locally saved images. We recommend using a small dataset with less than 1000 images for the first run. If needed, download such a dataset with git clone https://github.com/lightly-ai/dataset_clothing_images.git clothing_dataset.

For a more in-depth guide, see our step-by-step getting started.

Step 1: Install Lightly Pip Package

pip3 install lightly

Step 2: Download the Lightly Worker

docker pull lightly/worker:latest
docker run --shm-size="1024m" --rm -it lightly/worker:latest sanity_check=True

If these commands fail, follow our docker installation guide.

While the worker is being downloaded, you can already continue with step 3, as both are independent of each other.

🚧

Mac with Apple Silicon

If you use a Mac with an Apple silicon chip, make sure to enable Rosetta emulation in Docker Desktop for fast processing. To enable it go to Docker Desktop > Settings > General > Use Rosetta for x86_64/amd64 emulation on Apple Silicon. This requires Docker Desktop 4.25 or later.

Step 3: Schedule a Selection Run & Register a Worker for It

  1. Create a python script named e.g. schedule_selection_run.py and copy the following code into it.
  2. Change two variables in it: Set the DATASET_PATH to your images and set the LIGHTLY_TOKEN.
  3. Then run the script, e.g. with python3 schedule_selection_run.py.
from pathlib import Path
from datetime import datetime
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType, DatasourcePurpose

###### CHANGE THESE 2 VARIABLES
DATASET_PATH = Path("CHANGE_ME")  # e.g., Path("/path/to/images") or Path("clothing_dataset")
LIGHTLY_TOKEN = "CHANGE_ME_TO_YOUR_TOKEN"  # Copy from https://app.lightly.ai/preferences
######

assert DATASET_PATH.exists(), f"Dataset path {DATASET_PATH} does not exist."

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token=LIGHTLY_TOKEN)

# Create the dataset on the Lightly Platform.
# See our guide for more details and options:
# https://docs.lightly.ai/docs/set-up-your-first-dataset
client.create_dataset(
    dataset_name=f"first_dataset__{datetime.now().strftime('%Y_%m_%d__%H_%M_%S')}",
    dataset_type=DatasetType.IMAGES,
)

# Configure the datasources.
# See our guide for more details and options:
# https://docs.lightly.ai/docs/set-up-your-first-dataset
client.set_local_config(purpose=DatasourcePurpose.INPUT)
client.set_local_config(purpose=DatasourcePurpose.LIGHTLY)

# Schedule a run on the dataset to select 50 diverse samples.
# See our guide for more details and options:
# https://docs.lightly.ai/docs/run-your-first-selection
scheduled_run_id = client.schedule_compute_worker_run(
    worker_config={"shutdown_when_job_finished": True},
    selection_config={
        "n_samples": 50,
        "strategies": [
            {"input": {"type": "EMBEDDINGS"}, "strategy": {"type": "DIVERSITY"}}
        ],
    },
)

# Print the next commands
print(
    f"\nDocker Run command: \n"
    f"\033[7m"
    f"docker run --shm-size='1024m' --rm -it \\\n"
    f"\t-v '{DATASET_PATH.absolute()}':/input_mount:ro \\\n"
    f"\t-v '{Path('lightly').absolute()}':/lightly_mount \\\n"
    f"\t-e LIGHTLY_TOKEN={LIGHTLY_TOKEN} \\\n"
    "\tlightly/worker:latest\n"
    f"\033[0m"
)
print(
    "\nLightly Serve command:\n"
    f"\033[7m"
    f"lightly-serve input_mount='{DATASET_PATH.absolute()}' "
    f"lightly_mount='{Path('lightly').absolute()}'\n"
    f"\033[0m"
)

Step 4: Process the Run With the Lightly Worker

Run the Docker Run command printed by the python script from step 3.

The worker will take a while to process your dataset.

πŸ‘

Congratulations! You successfully ran your first selection with the Lightly Worker!

Step 5: Explore the Selected Dataset

Next, you need to serve the images from your local disk to your local browser by using the Lightly Serve command printed by the Python script from step 3 as well.

In case your images are on a machine different from your web browser (i.e., dataset_path of the above script is not on the computer you are reading this ), you also need to forward a port. See the docs on port forwarding.

πŸ‘

Awesome! You are now able to view and explore the dataset interactively on the Lightly Platform.

Next Steps

Lightly is a powerful tool for automated data curation. To better understand all the possibilities of the Lightly Worker and how to setup pipelines at your enterprise, please follow the following guides:

  • To understand the commands used in this quick start better, see Getting Started
  • For changing the selection configuration, see Selection
  • For using data stored in cloud storage (AWS S3, Google Cloud Storage, Azure), see Cloud Storage
  • To understand how Lightly ensures total PII compliance and ensures no sensitive data leaves your premises, see Security.
  • For getting into more advanced features, either do one of the tutorials in the Tutorialssection or directly go to the Advancedsection.