Instance SegmentationΒΆ
Note
π₯ LightlyTrain now supports training DINOv3-based instance segmentation models with the EoMT architecture by Kerssies et al.!
Benchmark ResultsΒΆ
Below we provide the models and report the validation mAP and inference FPS of different DINOv3 models fine-tuned on COCO with LightlyTrain. You can check here how to use these models for further fine-tuning.
You can also explore running inference and training these models using our Colab notebook:
COCOΒΆ
Implementation |
Model |
#Params (M) |
Input Size |
Val mAP mask |
Avg. FPS |
|---|---|---|---|---|---|
LightlyTrain |
dinov3/vits16-eomt-inst-coco |
21.6 |
640x640 |
32.6 |
51.5 |
LightlyTrain |
dinov3/vitb16-eomt-inst-coco |
85.7 |
640x640 |
40.3 |
25.2 |
LightlyTrain |
dinov3/vitl16-eomt-inst-coco |
303.2 |
640x640 |
46.2 |
12.5 |
Original EoMT |
dinov3/vitl16-eomt-inst-coco |
303.2 |
640x640 |
45.9 |
- |
Training follows the protocol in the original EoMT paper.
Models are trained for 90K steps (~12 epochs) on the COCO dataset with batch size 16
and learning rate 2e-4. The average FPS values were measured with model compilation
using torch.compile on a single NVIDIA T4 GPU with FP16 precision.
Train an Instance Segmentation ModelΒΆ
Training an instance segmentation model with LightlyTrain is straightforward and only requires a few lines of code. See data for more details on how to prepare your dataset.
import lightly_train
if __name__ == "__main__":
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="dinov3/vitl16-eomt-inst-coco",
data={
"path": "my_data_dir", # Path to dataset directory
"train": "images/train", # Path to training images
"val": "images/val", # Path to validation images
"names": { # Classes in the dataset
0: "background",
1: "car",
2: "bicycle",
# ...
},
},
)
During training, the best and last model weights are exported to
out/my_experiment/exported_models/, unless disabled in save_checkpoint_args:
best (highest validation mask mAP):
exported_best.ptlast:
exported_last.pt
You can use these weights to continue fine-tuning on another dataset by loading the
weights with model="<checkpoint path>":
import lightly_train
if __name__ == "__main__":
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="out/my_experiment/exported_models/exported_best.pt", # Continue training from the best model
data={...},
)
Load the Trained Model from Checkpoint and PredictΒΆ
After the training completes, you can load the best model checkpoints for inference like this:
import lightly_train
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
results = model.predict("image.jpg")
labels = results["labels"] # (N,) tensor of predicted class IDs
masks = results["masks"] # (N, height, width) tensor of predicted masks
scores = results["scores"] # (N,) tensor of predicted confidence scores
Or use one of the pretrained models directly from LightlyTrain:
import lightly_train
model = lightly_train.load_model("dinov3/vitl16-eomt-inst-coco")
results = model.predict("image.jpg")
Visualize the PredictionsΒΆ
You can visualize the predicted masks like this:
import matplotlib.pyplot as plt
from torchvision.io import read_image
from torchvision.utils import draw_segmentation_masks
image = read_image("image.jpg")
image_with_masks = draw_segmentation_masks(image, masks, alpha=0.6)
plt.imshow(image_with_masks.permute(1, 2, 0))
DataΒΆ
LightlyTrain supports instance segmentation datasets in YOLO format. Every image must have a corresponding annotation file that contains for every object in the image a line with the class ID and (x1, y1, x2, y2, β¦) polygon coordinates in normalized format.
0 0.782016 0.986521 0.937078 0.874167 0.957297 0.782021 0.950562 0.739333
1 0.557859 0.143813 0.487078 0.0314583 0.859547 0.00897917 0.985953 0.130333 0.984266 0.184271
The following image formats are supported:
jpg
jpeg
png
ppm
bmp
pgm
tif
tiff
webp
Your dataset directory must be organized like this:
my_data_dir/
βββ images
β βββ train
β β βββ image1.jpg
β β βββ image2.jpg
β β βββ ...
β βββ val
β βββ image1.jpg
β βββ image2.jpg
β βββ ...
βββ labels
βββ train
β βββ image1.txt
β βββ image2.txt
β βββ ...
βββ val
βββ image1.txt
βββ image2.txt
βββ ...
Alternatively, the train/val splits can also be at the top level:
my_data_dir/
βββ train
β βββ images
β β βββ image1.jpg
β β βββ image2.jpg
β β βββ ...
β βββ labels
β βββ image1.txt
β βββ image2.txt
β βββ ...
βββ val
βββ images
β βββ image1.jpg
β βββ image2.jpg
β βββ ...
βββ labels
βββ image1.txt
βββ image2.txt
βββ ...
The data argument in train_instance_segmentation must point to the dataset
directory and specify the paths to the training and validation images relative to
the dataset directory. For example:
import lightly_train
if __name__ == "__main__":
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="dinov3/vitl16-eomt-inst-coco",
data={
"path": "my_data_dir", # Path to dataset directory
"train": "images/train", # Path to training images
"val": "images/val", # Path to validation images
"names": { # Classes in the dataset
0: "background", # Classes must match those in the annotation files
1: "car",
2: "bicycle",
# ...
},
},
)
ModelΒΆ
The model argument defines the model used for instance segmentation training. The
following models are available:
DINOv3 ModelsΒΆ
dinov3/vits16-eomtdinov3/vits16plus-eomtdinov3/vitb16-eomtdinov3/vitl16-eomtdinov3/vitl16plus-eomtdinov3/vith16plus-eomtdinov3/vit7b16-eomtdinov3/vits16-eomt-inst-coco(fine-tuned on COCO)dinov3/vitb16-eomt-inst-coco(fine-tuned on COCO)dinov3/vitl16-eomt-inst-coco(fine-tuned on COCO)
All models are pretrained by Meta and fine-tuned by Lightly.
LoggingΒΆ
Logging is configured with the logger_args argument. The following loggers are
supported:
mlflow: Logs training metrics to MLflow (disabled by default, requires MLflow to be installed)tensorboard: Logs training metrics to TensorBoard (enabled by default, requires TensorBoard to be installed)
MLflowΒΆ
Important
MLflow must be installed with pip install "lightly-train[mlflow]".
The mlflow logger can be configured with the following arguments:
import lightly_train
if __name__ == "__main__":
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="dinov3/vitl16-eomt-inst-coco",
data={
# ...
},
logger_args={
"mlflow": {
"experiment_name": "my_experiment",
"run_name": "my_run",
"tracking_uri": "tracking_uri",
},
},
)
TensorBoardΒΆ
TensorBoard logs are automatically saved to the output directory. Run TensorBoard in a new terminal to visualize the training progress:
tensorboard --logdir out/my_experiment
Disable the TensorBoard logger with:
logger_args={"tensorboard": None}
Resume TrainingΒΆ
There are two distinct ways to continue training, depending on your intention.
Resume Interrupted TrainingΒΆ
Use resume_interrupted=True to resume a previously interrupted or crashed training run.
This will pick up exactly where the training left off.
You must use the same
outdirectory as the original run.You must not change any training parameters (e.g., learning rate, batch size, data, etc.).
This is intended for continuing the same run without modification.
This will utilize the .ckpt checkpoint file out/my_experiment/checkpoints/last.ckpt
to restore the entire training state, including model weights, optimizer state, and epoch count.
Load Weights for a New RunΒΆ
As stated above, you can specify model="<checkpoint path"> to further fine-tune a
model from a previous run.
You are free to change training parameters.
This is useful for continuing training with a different setup.
We recommend using the exported best model weights from out/my_experiment/exported_models/exported_best.pt
for this purpose, though a .ckpt file can also be loaded.
Default Image Transform ArgumentsΒΆ
The following are the default train transform arguments. The validation arguments are automatically inferred from the train arguments.
You can configure the image size and normalization like this:
import lightly_train
if __name__ == "__main__":
lightly_train.train_instance_segmentation(
out="out/my_experiment",
model="dinov3/vitl16-eomt-inst-coco",
data={
# ...
}
transform_args={
"image_size": (640, 640), # (height, width)
"normalize": {
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225],
},
},
)
EoMT Instance Segmentation DINOv3 Default Transform Arguments
Train
{
"bbox_params": "BboxParams",
"channel_drop": null,
"color_jitter": null,
"image_size": "auto",
"normalize": "auto",
"num_channels": "auto",
"random_crop": {
"fill": 0,
"height": "auto",
"pad_if_needed": true,
"pad_position": "center",
"prob": 1.0,
"width": "auto"
},
"random_flip": {
"horizontal_prob": 0.5,
"vertical_prob": 0.0
},
"scale_jitter": {
"divisible_by": null,
"max_scale": 2.0,
"min_scale": 0.1,
"num_scales": 20,
"prob": 1.0,
"seed_offset": 0,
"sizes": null,
"step_seeding": false
},
"smallest_max_size": null
}
Val
{
"bbox_params": "BboxParams",
"channel_drop": null,
"color_jitter": null,
"image_size": null,
"normalize": "auto",
"num_channels": "auto",
"random_crop": null,
"random_flip": null,
"scale_jitter": null,
"smallest_max_size": null
}
In case you need different parameters for training and validation, you can pass an
optional val dictionary to transform_args to override the validation parameters:
transform_args={
"image_size": (640, 640), # (height, width)
"normalize": {
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225],
},
"val": { # Override validation parameters
"image_size": (512, 512), # (height, width)
}
}