Quick Start - Object Detection

https://colab.research.google.com/assets/colab-badge.svg

This guide demonstrates how to use LightlyTrain for object detection with our state-of-the-art LTDETR model built on DINOv3.

Installation

pip install lightly-train

Important

LightlyTrain is officially supported on:

  • Linux: CPU or CUDA

  • MacOS: CPU only

  • Windows (experimental): CPU or CUDA

We are planning to support MPS for MacOS.

Check the installation instructions for more details.

Prediction using LightlyTrain’s model weights

Download an example image

Download an example image for inference:

wget -O image.jpg http://images.cocodataset.org/val2017/000000577932.jpg

Load the model weights

Load the model with LightlyTrain’s load_model function. This will automatically download the model weights and load the model:

import lightly_train

model = lightly_train.load_model("dinov3/convnext-tiny-ltdetr-coco")

Predict the objects

Run model.predict on the image. The method accepts file paths, URLs, PIL Images, or tensors as input:

results = model.predict("image.jpg")
results["labels"]   # Class labels, tensor of shape (num_boxes,)
results["bboxes"]   # Bounding boxes in (xmin, ymin, xmax, ymax) absolute pixel
                    # coordinates of the original image. Tensor of shape (num_boxes, 4).
results["scores"]   # Confidence scores, tensor of shape (num_boxes,)

Visualize the results

Visualize the image and results to check what objects were detected:

import matplotlib.pyplot as plt
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes

image = read_image("image.jpg")
image_with_boxes = draw_bounding_boxes(
    image,
    boxes=results["bboxes"],
    labels=[model.classes[label.item()] for label in results["labels"]],
)
plt.imshow(image_with_boxes.permute(1, 2, 0))
plt.show()
_images/street.jpg

Train object detection model

Training your own detection model is straightforward with LightlyTrain.

Download dataset

First download a dataset. The dataset must be in YOLO format, see the documentation for more details. You can use labelformat to convert any dataset to the YOLO format:

wget https://github.com/lightly-ai/coco128_yolo/releases/download/v0.0.1/coco128_yolo.zip && unzip -q coco128_yolo.zip

The dataset looks like this after the download completes:

coco128_yolo
├── images
│   ├── train2017
│   │   ├── 000000000009.jpg
│   │   ├── 000000000025.jpg
│   │   ├── ...
│   │   └── 000000000650.jpg
│   └── val2017
│       ├── 000000000139.jpg
│       ├── 000000000285.jpg
│       ├── ...
│       └── 000000013201.jpg
└── labels
    ├── train2017
    │   ├── 000000000009.txt
    │   ├── 000000000025.txt
    │   ├── ...
    │   └── 000000000659.txt
    └── val2017
        ├── 000000000139.txt
        ├── 000000000285.txt
        ├── ...
        └── 000000013201.txt

Start training

Start the training with the train_object_detection function. You only have to specify the output directory, model, and input data. LightlyTrain automatically sets the remaining training parameters and applies image augmentations. Of course you can always customize these settings if needed:

import lightly_train

lightly_train.train_object_detection(
    out="out/my_experiment",
    model="dinov3/convnext-tiny-ltdetr-coco",
    steps=100,  # Small number of steps for demonstration, default is 90_000.
    batch_size=4,  # Small batch size for demonstration, default is 16.
    data={
        "path": "coco128_yolo",
        "train": "images/train2017",
        "val": "images/val2017",
        "names": {
            0: "person",
            1: "bicycle",
            2: "car",
            3: "motorcycle",
            4: "airplane",
            5: "bus",
            6: "train",
            7: "truck",
            8: "boat",
            9: "traffic light",
            10: "fire hydrant",
            11: "stop sign",
            12: "parking meter",
            13: "bench",
            14: "bird",
            15: "cat",
            16: "dog",
            17: "horse",
            18: "sheep",
            19: "cow",
            20: "elephant",
            21: "bear",
            22: "zebra",
            23: "giraffe",
            24: "backpack",
            25: "umbrella",
            26: "handbag",
            27: "tie",
            28: "suitcase",
            29: "frisbee",
            30: "skis",
            31: "snowboard",
            32: "sports ball",
            33: "kite",
            34: "baseball bat",
            35: "baseball glove",
            36: "skateboard",
            37: "surfboard",
            38: "tennis racket",
            39: "bottle",
            40: "wine glass",
            41: "cup",
            42: "fork",
            43: "knife",
            44: "spoon",
            45: "bowl",
            46: "banana",
            47: "apple",
            48: "sandwich",
            49: "orange",
            50: "broccoli",
            51: "carrot",
            52: "hot dog",
            53: "pizza",
            54: "donut",
            55: "cake",
            56: "chair",
            57: "couch",
            58: "potted plant",
            59: "bed",
            60: "dining table",
            61: "toilet",
            62: "tv",
            63: "laptop",
            64: "mouse",
            65: "remote",
            66: "keyboard",
            67: "cell phone",
            68: "microwave",
            69: "oven",
            70: "toaster",
            71: "sink",
            72: "refrigerator",
            73: "book",
            74: "clock",
            75: "vase",
            76: "scissors",
            77: "teddy bear",
            78: "hair drier",
            79: "toothbrush",
        },
    },
)

Once the training is complete, the output directory looks like this:

out/my_experiment
├── checkpoints
│   ├── best.ckpt
│   └── last.ckpt
├── events.out.tfevents.1764251158.ef9b159fe4b8.273.0
├── exported_models
│   ├── exported_best.pt
│   └── exported_last.pt
└── train.log

Load trained model

The best model checkpoint is saved to out/my_experiment/exported_models/exported_best.pt. You can load it for inference like this:

# Load the model for inference
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")

# Run inference
results = model.predict("image.jpg")

# Plot results
image = read_image("image.jpg")
image_with_boxes = draw_bounding_boxes(
    image,
    boxes=results["bboxes"],
    labels=[model.classes[label.item()] for label in results["labels"]],
)
plt.imshow(image_with_boxes.permute(1, 2, 0))
plt.show()
_images/street.jpg

Next Steps