Object Detection with Ultralytics’ YOLO

This tutorial demonstrates how to pretrain a YOLO model using LightlyTrain and then fine-tune it for object detection using the ultralytics framework. To this end, we will first pretrain on a 25k image subset of the COCO dataset (only the images, no labels!), and subsequently finetune on the labeled PASCAL VOC dataset.

Warning

Using Ultralytics models might require a commercial Ultralytics license. See the Ultralytics website for more information.

Install Dependencies

Install the required packages:

  • lightly-train for pretraining, with support for ultralytics’ YOLO models

  • supervision to visualize some of the annotated pictures

pip install "lightly-train[ultralytics]" "supervision==0.25.1"

Pretraining on COCO-minitrain

Time for some magic! We’ll first grab the COCO-minitrain dataset (25k images) directly from HuggingFace…

wget https://huggingface.co/datasets/bryanbocao/coco_minitrain/resolve/main/coco_minitrain_25k.zip

… unzip it…

unzip coco_minitrain_25k.zip

… and since LightlyTrain does not require any labels, we can can confidently delete all the labels:

rm -rf coco_minitrain_25k/labels

With the dataset ready, we can now start the pretraining. Pretraining with LightlyTrain could not be easier, you just pass the following parameters:

  • out: you simply state where you want your logs and exported model to go to

  • model: the model that you want to train, e.g. yolo11s from Ultralytics

  • data: the path to a folder with images

Your data is simply assumed to be an arbitrarily nested folder; LightlyTrain with find all images on its own and since there are no labels required there is no danger of ever using false labels! 🕵️‍♂️

# pretrain_yolo.py
import lightly_train

if __name__ == "__main__":
    # Pretrain with LightlyTrain.
    lightly_train.train(
        out="out/coco_minitrain_pretrain",  # Output directory.
        model="ultralytics/yolo11s.yaml",   # Pass the YOLO model (use .yaml ending to start with random weights).
        data="coco_minitrain_25k/images",   # Path to a directory with training images.
        epochs=100,                         # Adjust epochs for shorter training.
        batch_size=128,                     # Adjust batch size based on hardware.
    )
lightly-train --out=out/coco_minitrain_pretrain --model=ultralytics/yolo11s.yaml --data=coco_minitrain_25k/images --epochs=100 --batch-size=128

And just like that you pretrained a YOLO11s backbone! 🥳 This backbone can’t solve any task yet, so in the next step we will finetune it on the PASCAL VOC dataset.

Finetuning on PASCAL VOC

Now that the pretrained model has been exported, we will further fine-tune the model on the task of object detection. The exported model already has exactly the format that Ultralytics’ YOLO expects, so after getting the dataset ready, we can get started with only a few lines! ⚡️

In addition to fine-tuning the pretrained model we will also train a model that we initialize with random weights. This will let us compare the performance between the two, and show the great benefits of pretraining.

Download the PASCAL VOC Dataset

We can download the dataset directly using Ultralytics’ API with the check_det_dataset function:

from ultralytics.data.utils import check_det_dataset

dataset = check_det_dataset("VOC.yaml")

Ultralytics always downloads your datasets to a fixed location, which you can fetch via their settings module:

from ultralytics import settings

print(settings["datasets_dir"])

Inside that directory (), you will now have the following structure of images and labels:

tree -d <DATASET-DIR>/VOC -I VOCdevkit

>    datasets/VOC
>    ├── images
>       ├── test2007
>       ├── train2007
>       ├── train2012
>       ├── val2007
>       └── val2012
>    └── labels
>        ├── test2007
>        ├── train2007
>        ├── train2012
>        ├── val2007
>        └── val2012

Inspect a few Images

Let’s use supervision and look at a few of the annotated samples to get a feeling of what the data looks like:

import random

import matplotlib.pyplot as plt
import supervision as sv
import yaml
from ultralytics import settings
from ultralytics.data.utils import check_det_dataset

dataset = check_det_dataset("VOC.yaml")

detections = sv.DetectionDataset.from_yolo(
    data_yaml_path=dataset["yaml_file"],
    images_directory_path=f"{settings['datasets_dir']}/VOC/images/train2012",
    annotations_directory_path=f"{settings['datasets_dir']}/VOC/labels/train2012",
)

with open(dataset["yaml_file"], "r") as f:
    data = yaml.safe_load(f)

names = data["names"]

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

fig, ax = plt.subplots(2, 2, figsize=(10, 10))
ax = ax.flatten()

detections = [detections[random.randint(0, len(detections))] for _ in range(4)]

for i, (path, image, annotation) in enumerate(detections):
    annotated_image = box_annotator.annotate(scene=image, detections=annotation)
    annotated_image = label_annotator.annotate(
        scene=annotated_image,
        detections=annotation,
        labels=[names[elem] for elem in annotation.class_id],
    )
    ax[i].imshow(annotated_image[..., ::-1])
    ax[i].axis("off")

fig.tight_layout()
fig.show()

VOC2012 Training Samples

Finetuning the Pretrained Model

All we have to do is to pass the path to the pretrained model to the YOLO class and the rest is the same as always with Ultralytics.

# finetune_yolo.py

from ultralytics import YOLO

if __name__ == "__main__":
    # Load the exported model.
    model = YOLO("out/coco_minitrain_pretrain/exported_models/exported_last.pt")

    # Fine-tune with ultralytics.
    model.train(data="VOC.yaml", epochs=30, project="logs/voc_yolo11s", name="from_pretrained")
yolo detect train model="out/my_experiment/exported_models/exported_last.pt" data="VOC.yaml" epochs=30 project="logs/voc_yolo11s" name="from_pretrained"

Finetuning the Randomly Initialized Model

In order to quantify the influence of our pretraining, we also train a model from random weights, in Ultralytics this follows the .yaml name convention.

# finetune_scratch_yolo.py

from ultralytics import YOLO

if __name__ == "__main__":
    # Load the exported model.
    model = YOLO("yolo11s.yaml") # randomly initialized model

    # Fine-tune with ultralytics.
    model.train(data="VOC.yaml", epochs=30, project="logs/voc_yolo11s", name="from_scratch")
yolo detect train model="yolo11s.yaml" data="VOC.yaml" epochs=30 project="logs/voc_yolo11s" name="from_scratch"

Evaluating the Model Performance

Congratulations, you made it almost to the end! 🎉 The last thing we’ll do is to analyze the performance between the two. A very common metric to measure the performance of object detectors is the mAP50-95 which we plot in the next cell, for both the pretrained model and the model that we trained from scratch.

import matplotlib.pyplot as plt
import pandas as pd

res_scratch = pd.read_csv("logs/voc_yolo11s/from_scratch/results.csv")
res_finetune = pd.read_csv("logs/voc_yolo11s/from_pretrained/results.csv")

fig, ax = plt.subplots()
ax.plot(res_scratch["epoch"], res_scratch["metrics/mAP50-95(B)"], label="scratch")
ax.plot(res_finetune["epoch"], res_finetune["metrics/mAP50-95(B)"], label="finetune")
ax.set_xlabel("Epoch")
ax.set_ylabel("mAP50-95")
max_pretrained = res_finetune["metrics/mAP50-95(B)"].max()
max_scratch = res_scratch["metrics/mAP50-95(B)"].max()
ax.set_title(
    f"Pretraining is {(max_pretrained - max_scratch) / max_scratch * 100:.2f}% better than scratch"
)
ax.legend()
plt.show()

Pretraining vs Scratch

As clearly visible in the plot, the pretrained models converges much faster and achieves a significantly higher mAP50-95 than the model trained from scratch!

Next Steps

Congratulations, you’ve mastered the basics! 🎉 Ready to take it further? Here are some exciting next steps:

  • Go beyond distillation and explore other pretraining methods in LightlyTrain. Check Methods for more exciting possibilities!

  • Try your hand at different YOLO flavors (YOLOv5, YOLOv6, YOLOv8).

  • Take your pretrained model for a spin with image embeddings and similarity search.

Happy experimenting! 🚀