LightlyTrain Documentation¶

Train Better Models, Faster - No Labels Needed

LightlyTrain brings self-supervised pretraining to real-world computer vision pipelines, using your unlabeled data to reduce labeling costs and speed up model deployment. Leveraging the state-of-the-art from research, it pretrains your model on your unlabeled, domain-specific data, significantly reducing the amount of labeling needed to reach a high model performance.

This allows you to focus on new features and domains instead of managing your labeling cycles. LightlyTrain is designed for simple integration into existing training pipelines and supports a wide range of model architectures and use cases out of the box.

News¶

[0.11.0] - 2025-08-15: 🚀 New DINOv3 Support: Pretrain your own model with distillation from DINOv3 weights. Or fine-tune our SOTA EoMT semantic segmentation model with a DINOv3 backbone! 🚀
[0.10.0] - 2025-08-04: 🔥 Train state-of-the-art semantic segmentation models with our new DINOv2 semantic segmentation fine-tuning method! 🔥
[0.9.0] - 2025-07-21: DINOv2 pretraining is now officially available!
[0.8.0] - 2025-06-10: DINOv2 pretraining is now available (beta 🔬)!
[0.7.0] - 2025-05-26: Up to 3x faster distillation and higher accuracy with Distillation v2 (new default method)!

Why LightlyTrain?¶

💸 No Labels Required: Speed up development by pretraining models on your unlabeled image and video data.
🔄 Domain Adaptation: Improve models by pretraining on your domain-specific data (e.g. video analytics, agriculture, automotive, healthcare, manufacturing, retail, and more).
🏗️ Model & Task Agnostic: Compatible with any architecture and task, including detection, classification, and segmentation.
🚀 Industrial-Scale Support: LightlyTrain scales from thousands to millions of images. Supports on-prem, cloud, single, and multi-GPU setups.

benchmark results — On COCO, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more details.¶

How It Works ¶

Install LightlyTrain:

pip install lightly-train

Then start pretraining with:

import lightly_train

if __name__ == "__main__":
  lightly_train.train(
      out="out/my_experiment",            # Output directory
      data="my_data_dir",                 # Directory with images
      model="torchvision/resnet50",       # Model to train
  )

This will pretrain a Torchvision ResNet-50 model using unlabeled images from my_data_dir. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment. The final model is exported to out/my_experiment/exported_models/exported_last.pt.

Finally, load the pretrained model and fine-tune it using your existing training pipeline:

import torch
from torchvision import models

# Load the pretrained model
model = models.resnet50()
model.load_state_dict(torch.load("out/my_experiment/exported_models/exported_last.pt", weights_only=True))

# Fine-tune the model with your existing training pipeline
...

Features¶

Train models on any image data without labels
Train models from popular libraries such as Torchvision, TIMM, Ultralytics, SuperGradients, RT-DETR, RF-DETR, and YOLOv12
Train custom models with ease
No self-supervised learning expertise required
Automatic SSL method selection (coming soon!)
Python, Command Line, and Docker support
Built for high performance including multi-GPU and multi-node support
Export models for fine-tuning or inference
Generate and export image embeddings
Monitor training progress with TensorBoard, Weights & Biases, and more
Runs fully on-premises with no API authentication and no telemetry

Supported Models¶

Library	Supported Models	Docs
Torchvision	ResNet, ConvNext, ShuffleNetV2	🔗
TIMM	All models	🔗
Ultralytics	YOLOv5, YOLOv6, YOLOv8, YOLO11, YOLO12	🔗
RT-DETR	RT-DETR & RT-DETRv2	🔗
RF-DETR	RF-DETR	🔗
YOLOv12	YOLOv12	🔗
SuperGradients	PP-LiteSeg, SSD, YOLO-NAS	🔗
Custom Models	Any PyTorch model	🔗

For an overview of all supported models and usage instructions, see the full model docs.

Supported Training Methods¶

DINOv2 Distillation (recommended 🚀)
DINOv2
DINO
SimCLR

See the full methods docs for details.

FAQ¶

Check our complete FAQ for more information.

License¶

LightlyTrain offers flexible licensing options to suit your specific needs:

AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.
Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.

We’re committed to supporting both open-source and commercial users. Please contact us to discuss the best licensing option for your project!