Object DetectionΒΆ

Note

πŸ”₯ LightlyTrain now supports training LT-DETR: DINOv3- and DINOv2-based object detection models with the super fast RT-DETR detection architecture! Our largest model achieves an mAP50:95 of 60.0 on the COCO validation set!

Benchmark ResultsΒΆ

Below we provide the model checkpoints and report the validation mAP50:95 and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset. You can check here for how to use these model checkpoints for further fine-tuning. The average FPS values were measured using TensorRT in the version 10.13.3.9 and on a Nvidia T4 GPU with batch size 1.

COCOΒΆ

Implementation

Backbone Model

AP50:95

Latency (ms)

# Params (M)

Input Size

Checkpoint Name

LightlyTrain

dinov2/vits14-ltdetr

55.7

16.87

55.3

644Γ—644

dinov2/vits14-noreg-ltdetr-coco

LightlyTrain

dinov3/convnext-tiny-ltdetr

54.4

13.29

61.1

640Γ—640

dinov3/convnext-tiny-ltdetr-coco

LightlyTrain

dinov3/convnext-small-ltdetr

56.9

17.65

82.7

640Γ—640

dinov3/convnext-small-ltdetr-coco

LightlyTrain

dinov3/convnext-base-ltdetr

58.6

24.68

121.0

640Γ—640

dinov3/convnext-base-ltdetr-coco

LightlyTrain

dinov3/convnext-large-ltdetr

60.0

42.30

230.0

640Γ—640

dinov3/convnext-large-ltdetr-coco

Object Detection with LT-DETRΒΆ

Training an object detection model with LightlyTrain is straightforward and only requires a few lines of code. See data for details on how to prepare your dataset.

Train an Object Detection ModelΒΆ

import lightly_train

if __name__ == "__main__":
    lightly_train.train_object_detection(
        out="out/my_experiment",
        model="dinov3/convnext-small-ltdetr-coco",
        data={
            "path": "base_path_to_your_dataset",
            "train": "images/train2012",
            "val": "images/val2012",
            "names": {
                0: "person",
                1: "bicycle",
                # ...
            },
        }
    )

During training, both the

  • best (with highest validation mAP50:95) and

  • last (last validation round as determined by save_checkpoint_args.save_every_num_steps)

model weights are exported to out/my_experiment/exported_models/, unless disabled in save_checkpoint_args. You can use these weights to continue fine-tuning on another task by loading the weights via model="<checkpoint path>":

import lightly_train

if __name__ == "__main__":
    lightly_train.train_object_detection(
        out="out/my_experiment",
        model="out/my_experiment/exported_models/exported_best.pt", # Use the best model to continue training
        data={...},
    )

Load the Trained Model from Checkpoint and PredictΒΆ

After the training completes, you can load the best model checkpoints for inference like this:

import lightly_train

model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")
results = model.predict("path/to/image.jpg")

Or use one of the pre-trained model weights directly from LightlyTrain:

import lightly_train

model = lightly_train.load_model("dinov3/convnext-tiny-ltdetr-coco")
results = model.predict("path/to/image.jpg")

Visualize the ResultΒΆ

After making the predictions with the model weights, you can visualize the predicted bounding boxes like this:

# ruff: noqa: F821
import matplotlib.pyplot as plt
from torchvision import io, utils

import lightly_train

model = lightly_train.load_model("dinov3/convnext-tiny-ltdetr-coco")
labels, boxes, scores = model.predict("<image>.jpg").values()

# Visualize predictions.
image_with_boxes = utils.draw_bounding_boxes(
    image=io.read_image("<image>.jpg"),
    boxes=boxes,
    labels=[model.classes[i.item()] for i in labels],
)

fig, ax = plt.subplots(figsize=(30, 30))
ax.imshow(image_with_boxes.permute(1, 2, 0))
fig.savefig("predictions.png")

The predicted boxes are in the absolute (x_min, y_min, x_max, y_max) format, i.e. represent the size of the dimension of the bounding boxes in pixels.

OutΒΆ

The out argument specifies the output directory where all training logs, model exports, and checkpoints are saved. It looks like this after training:

out/my_experiment
β”œβ”€β”€ checkpoints
β”‚   └── last.ckpt                                       # Last checkpoint
β”œβ”€β”€ exported_models
|   └── exported_last.pt                                # Last model exported (unless disabled)
|   └── exported_best.pt                                # Best model exported (unless disabled)
β”œβ”€β”€ events.out.tfevents.1721899772.host.1839736.0       # TensorBoard logs
└── train.log                                           # Training logs

The final model checkpoint is saved to out/my_experiment/checkpoints/last.ckpt. The last and best model weights are exported to out/my_experiment/exported_models/ unless disabled in save_checkpoint_args.

Tip

Create a new output directory for each experiment to keep training logs, model exports, and checkpoints organized.

DataΒΆ

LightlyTrain supports training object detection models with images and bounding boxes. Every image must have a corresponding annotation file (in YOLO format) that contains for every object in the image a line with the class ID and 4 normalized bounding box coordinates (x_center, y_center, width, height). The file should have the .txt extension and an example annotation file for an image with two objects could look like this:

0 0.716797 0.395833 0.216406 0.147222
1 0.687500 0.379167 0.255208 0.175000

The following image formats are supported:

  • jpg

  • jpeg

  • png

  • ppm

  • bmp

  • pgm

  • tif

  • tiff

  • webp

Your dataset directory should be organized like this:

base_path_to_your_dataset/
β”œβ”€β”€ images
β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ image1.jpg
β”‚   β”‚   β”œβ”€β”€ image2.jpg
β”‚   β”‚   └── ...
β”‚   └── val
β”‚       β”œβ”€β”€ image1.jpg
β”‚       β”œβ”€β”€ image2.jpg
β”‚       └── ...
└── labels
    β”œβ”€β”€ train
    β”‚   β”œβ”€β”€ image1.txt
    β”‚   β”œβ”€β”€ image2.txt
    β”‚   └── ...
    └── val
        β”œβ”€β”€ image1.txt
        β”œβ”€β”€ image2.txt
        └── ...

Alternatively, the splits can also be at the top level:

base_path_to_your_dataset/
β”œβ”€β”€ train
β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ image1.jpg
β”‚   β”‚   β”œβ”€β”€ image2.jpg
β”‚   β”‚   └── ...
β”‚   └── labels
β”‚       β”œβ”€β”€ image1.txt
β”‚       β”œβ”€β”€ image2.txt
β”‚       └── ...
└── val
    β”œβ”€β”€ images
    β”‚   β”œβ”€β”€ image1.jpg
    β”‚   β”œβ”€β”€ image2.jpg
    β”‚   └── ...
    └── labels
        β”œβ”€β”€ image1.txt
        β”œβ”€β”€ image2.txt
        └── ...