Reuse Datasets¶
LightlyStudio persists every dataset (metadata, tags, annotations, captions, and embeddings) into a DuckDB file named lightly_studio.db. By reopening the same database you can resume work without re-indexing images or recomputing embeddings.
Choose where the database is stored¶
To store the DuckDB file elsewhere (for example, on a larger external disk or to maintain isolated projects), configure the database manager before creating or loading any datasets:
import lightly_studio as ls
ls.db_manager.connect(db_file="/data/lightly_studio.db")
Note
Within the .db file all paths are stored as absolute paths. This allows the software to fetch data for visualization even if you move the .db file around.
Start with a fresh database¶
To wipe the existing database and start over (for example, when iterating on ingestion code), pass cleanup_existing=True:
import lightly_studio as ls
ls.db_manager.connect(db_file="/data/lightly_studio.db", cleanup_existing=True)
Warning
This permanently deletes the existing DuckDB file, including all content inside it.
Reuse a dataset across runs¶
Restarting the same Python script will reopen the GUI with the previous state as long as you call ImageDataset.load or ImageDataset.load_or_create with the same name.
from __future__ import annotations
import lightly_studio as ls
DATASET_NAME = "sport_shooting"
IMAGE_DIRS = ["data/primary_images", "data/new_images_later"]
# Everything persists inside lightly_studio.db automatically.
dataset = ls.ImageDataset.load_or_create(name=DATASET_NAME)
# Only new samples are added by `add_images_from_path`
for image_dir in IMAGE_DIRS:
dataset.add_images_from_path(path=image_dir)
ls.start_gui()
- When you rerun the script later, only new files are indexed. Existing embeddings and annotations remain untouched. Embeddings are generated only for the new samples.
- Manual labels created in the GUI, metadata changed via Python, and tags assigned anywhere are all stored in
lightly_studio.db, so you can stop/start the process at will. - External files such as images/videos (
.jpg,.png,.mp4files etc.) remain in their original location; keep them accessible so the GUI can display them when you reopen the dataset.
For dataset-type-specific loading examples, see Image Dataset · From a Pre-Existing Dataset and Video Dataset · From a Pre-Existing Dataset.
Iterate using the CLI¶
Once your dataset is populated, you can reopen the GUI without rerunning the ingestion script. From the directory containing lightly_studio.db:
lightly-studio gui
A typical workflow is therefore:
- Run your Python script once to populate the database.
- Call
lightly-studio guias often as you want to browse, label, and curate.
If the DuckDB file lives elsewhere, point at it explicitly:
lightly-studio gui --db-file /data/lightly_studio.db