¶
Train Better Models, Faster - No Labels Needed
LightlyTrain brings self-supervised pretraining to real-world computer vision pipelines, using your unlabeled data to reduce labeling costs and speed up model deployment. Leveraging the state-of-the-art from research, it pretrains your model on your unlabeled, domain-specific data, significantly reducing the amount of labeling needed to reach a high model performance.
This allows you to focus on new features and domains instead of managing your labeling cycles. LightlyTrain is designed for simple integration into existing training pipelines and supports a wide range of model architectures and use-cases out of the box.
Why LightlyTrain?¶
💸 No Labels Required: Speed up development by pretraining models on your unlabeled image and video data.
🔄 Domain Adaptation: Improve models by pretraining on your domain-specific data (e.g. video analytics, agriculture, automotive, healthcare, manufacturing, retail, and more).
🏗️ Model & Task Agnostic: Compatible with any architecture and task, including detection, classification, and segmentation.
🚀 Industrial-Scale Support: LightlyTrain scales from thousands to millions of images. Supports on-prem, cloud, single, and multi-GPU setups.
How It Works
¶
Install LightlyTrain:
pip install lightly-train
Then start pretraining with:
import lightly_train
if __name__ == "__main__":
lightly_train.train(
out="out/my_experiment", # Output directory
data="my_data_dir", # Directory with images
model="torchvision/resnet50", # Model to train
)
This will pretrain a Torchvision ResNet-50 model using unlabeled images from my_data_dir
.
All training logs, model exports, and checkpoints are saved to the output directory
at out/my_experiment
. The final model is exported to out/my_experiment/exported_models/exported_last.pt
.
Finally, load the pretrained model and fine-tune it using your existing training pipeline:
import torch
from torchvision import models
# Load the pretrained model
model = models.resnet50()
model.load_state_dict(torch.load("out/my_experiment/exported_models/exported_last.pt"))
# Fine-tune the model with your existing training pipeline
...
See also
Looking for a full fine-tuning example? Head over to the Quick Start!
See also
Want to use your model to generate image embeddings instead? Check out the Embed guide!
Features¶
Train models on any image data without labels
Train models from popular libraries such as Torchvision, TIMM, Ultralytics, SuperGradients, RT-DETR, RF-DETR, and YOLOv12
Train custom models with ease
No self-supervised learning expertise required
Automatic SSL method selection (coming soon!)
Python, Command Line, and Docker support
Built for high performance including multi-GPU and multi-node support
Export models for fine-tuning or inference
Generate and export image embeddings
Monitor training progress with TensorBoard, Weights & Biases, and more
Runs fully on-premises with no API authentication and no telemetry
Supported Models¶
Library |
Supported Models |
Docs |
---|---|---|
Torchvision |
ResNet, ConvNext |
|
TIMM |
All models |
|
Ultralytics |
YOLOv5, YOLOv6, YOLOv8, YOLO11, YOLO12 |
|
RT-DETR |
RT-DETR |
|
RF-DETR |
RF-DETR |
|
YOLOv12 |
YOLOv12 |
|
SuperGradients |
PP-LiteSeg, SSD, YOLO-NAS |
|
Custom Models |
Any PyTorch model |
For an overview of all supported models and usage instructions, see the full model docs.
Contact us if you need support for additional models or libraries.
Supported Training Methods¶
DINOv2 Distillation (recommended 🚀)
See the full methods docs for details.
FAQ¶
Who is LightlyTrain for?
LightlyTrain is designed for engineers and teams who want to use their unlabeled data to its full potential. It is ideal if any of the following applies to you:
You want to speedup model development cycles
You have limited labeled data but abundant unlabeled data
You have slow and expensive labeling processes
You want to build your own foundation model
You work with domain-specific datasets (video analytics, robotics, medical, agriculture, etc.)
You cannot use public pretrained models
No pretrained models are available for your specific architecture
You want to leverage the latest research in self-supervised learning and distillation
How much data do I need?
We recommend a minimum of several thousand unlabeled images for training with LightlyTrain and 100+ labeled images for fine-tuning afterwards.
For best results:
Use at least 5x more unlabeled than labeled data
Even a 2x ratio of unlabeled to labeled data yields strong improvements
Larger datasets (>100,000 images) benefit from pretraining up to 3,000 epochs
Smaller datasets (<100,000 images) benefit from longer pretraining of up to 10,000 epochs
The unlabeled dataset must always be treated like a training split—never include validation images in pretraining to avoid data leakage.
What’s the difference between LightlyTrain and other self-supervised learning implementations?
LightlyTrain offers several advantages:
User-friendly: You don’t need to be an SSL expert - focus on training your model instead of implementation details.
Works with various model architectures: Integrates directly with different libraries such as Torchvision, Ultralytics, etc.
Handles complexity: Manages scaling from single GPU to multi-GPU training and optimizes hyperparameters.
Seamless workflow: Automatically pretrains the correct layers and exports models in the right format for fine-tuning.
Why should I use LightlyTrain instead of other already pretrained models?
LightlyTrain is most beneficial when:
Working with domain-specific data: When your data has a very different distribution from standard datasets (medical images, industrial data, etc.)
Facing policy or license restrictions: When you can’t use models pretrained on datasets with unclear licensing
Having limited labeled data: When you have access to a lot of unlabeled data but few labeled examples
Using custom architectures: When no pretrained checkpoints are available for your model
LightlyTrain is complementary to existing pretrained models and can start from either random weights or existing pretrained weights.
Check our complete FAQ for more information.
License¶
LightlyTrain offers flexible licensing options to suit your specific needs:
AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.
Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.
We’re committed to supporting both open-source and commercial users. Please contact us to discuss the best licensing option for your project!