Object Detection with Ultralytics’ YOLO¶

This tutorial demonstrates how to pre-train a YOLO model using lightly-train and then fine-tune it for object detection using the ultralytics framework. We will perform both steps on the PASCAL VOC dataset.

Warning

Using Ultralytics models might require a commercial Ultralytics license. See the Ultralytics website for more information.

Install Dependencies¶

Install the required packages:

lightly-train for pretraining, with support for ultralytics’ YOLO models
supervision to visualize some of the annotated pictures

pip install "lightly-train[ultralytics]" "supervision==0.25.1"

Download the Dataset¶

We can download the dataset directly using Ultralytics’ API with the check_det_dataset function:

from ultralytics.data.utils import check_det_dataset

dataset = check_det_dataset("VOC.yaml")

Ultralytics always downloads your datasets to a fixed location, which you can fetch via their settings module:

from ultralytics import settings

print(settings["datasets_dir"])

Inside that directory (), you will now have the following structure of images and labels:

tree -d <DATASET-DIR>/VOC -I VOCdevkit

>    datasets/VOC
>    ├── images
>    │   ├── test2007
>    │   ├── train2007
>    │   ├── train2012
>    │   ├── val2007
>    │   └── val2012
>    └── labels
>        ├── test2007
>        ├── train2007
>        ├── train2012
>        ├── val2007
>        └── val2012

Note

Labels are not required for self-supervised pre-training. We will use the labels only for finetuning.

Inspect a few Images¶

Let’s use supervision and look at a few of the annotated samples to get a feeling of what the data looks like:

import random

import matplotlib.pyplot as plt
import supervision as sv
import yaml
from ultralytics import settings
from ultralytics.data.utils import check_det_dataset

dataset = check_det_dataset("VOC.yaml")

detections = sv.DetectionDataset.from_yolo(
    data_yaml_path=dataset["yaml_file"],
    images_directory_path=f"{settings["datasets_dir"]}/VOC/images/train2012",
    annotations_directory_path=f"{settings["datasets_dir"]}/VOC/labels/train2012",
)

with open(dataset["yaml_file"], "r") as f:
    data = yaml.safe_load(f)

names = data["names"]

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

fig, ax = plt.subplots(2, 2, figsize=(10, 10))
ax = ax.flatten()

detections = [detections[random.randint(0, len(detections))] for _ in range(4)]

for i, (path, image, annotation) in enumerate(detections):
    annotated_image = box_annotator.annotate(scene=image, detections=annotation)
    annotated_image = label_annotator.annotate(
        scene=annotated_image,
        detections=annotation,
        labels=[names[elem] for elem in annotation.class_id],
    )
    ax[i].imshow(annotated_image[..., ::-1])
    ax[i].axis("off")

fig.tight_layout()
fig.show()

VOC2012 Training Samples

Pre-train and Fine-tune¶

We will use lightly-train to pre-train a YOLO11 model using self-supervised learning.

The following scripts or CLI commands will:

Initialize a YOLO11s model with random weights.
Pre-train the YOLO11s model on the training images of PASCAL VOC using distillation pretraining.
Export the pre-trained YOLO11s model.
Fine-tune the pre-trained model on PASCAL VOC dataset using labels, and evaluate its performance.

Python

# pretrain_yolo.py
import lightly_train
from ultralytics import settings

data_path = f"{settings["datasets_dir"]}/VOC/images/train2012"

if __name__ == "__main__":
    # Pre-train with lightly-train.
    lightly_train.train(
        out="out/my_experiment",            # Output directory.
        model="ultralytics/yolo11s.yaml",   # Pass the YOLO model.
        data=data_path,                     # Path to a directory with training images.
        epochs=100,                         # Adjust epochs for faster training.
        batch_size=64,                      # Adjust batch size based on hardware.
    )

# finetune_yolo.py

from ultralytics import YOLO

if __name__ == "__main__":
    # Load the exported model.
    model = YOLO("out/my_experiment/exported_models/exported_last.pt")

    # Fine-tune with ultralytics.
    model.train(data="VOC.yaml", epochs=100)

Command Line

lightly-train train out="out/my_experiment" data="<DATASET-DIR>/VOC/images/train2012" model="ultralytics/yolo11s.yaml" epochs=100 batch_size=64

yolo detect train model="out/my_experiment/exported_models/exported_last.pt" data="VOC.yaml" epochs=100

Congratulations! You have successfully pre-trained a model using lightly-train and fine-tuned it for object detection using ultralytics.

For more advanced options, explore the Python API and Ultralytics documentation.

Next Steps¶

Go beyond distillation pretraining and experiment self-supervised learning methods in lightly-train, such as DINO or SimCLR.
Try various YOLO models (YOLOv5, YOLOv6, YOLOv8).
Use the pre-trained model for other tasks, like image embeddings.