(tutorials-yolo)= # Object Detection with Ultralytics' YOLO This tutorial demonstrates how to pretrain a YOLO model using LightlyTrain and then fine-tune it for object detection using the `ultralytics` framework. To this end, we will first pretrain on a [25k image subset](https://github.com/giddyyupp/coco-minitrain) of the [COCO dataset](https://cocodataset.org/#home) (only the images, no labels!), and subsequently finetune on the labeled [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/). ```{warning} Using Ultralytics models might require a commercial Ultralytics license. See the [Ultralytics website](https://www.ultralytics.com/license) for more information. ``` ## Install Dependencies Install the required packages: - `lightly-train` for pretraining, with support for `ultralytics`' YOLO models - [`supervision`](https://github.com/roboflow/supervision) to visualize some of the annotated pictures ```bash pip install "lightly-train[ultralytics]" "supervision==0.25.1" ``` ## Pretraining on COCO-minitrain Time for some magic! We'll first grab the COCO-minitrain dataset (25k images) directly from HuggingFace... ```bash wget https://huggingface.co/datasets/bryanbocao/coco_minitrain/resolve/main/coco_minitrain_25k.zip ``` ... unzip it... ```bash unzip coco_minitrain_25k.zip ``` ... and since Lightly**Train** does not require any labels, we can can confidently delete all the labels: ```bash rm -rf coco_minitrain_25k/labels ``` With the dataset ready, we can now start the pretraining. Pretraining with Lightly**Train** could not be easier, you just pass the following parameters: - `out`: you simply state where you want your logs and exported model to go to - `model`: the model that you want to train, e.g. `yolo11s` from Ultralytics - `data`: the path to a folder with images Your data is simply assumed to be an arbitrarily nested folder; LightlyTrain with find all images on its own and since there are no labels required there is no danger of ever using false labels! 🕵️‍♂️ ````{tab} Python ```python # pretrain_yolo.py import lightly_train if __name__ == "__main__": # Pretrain with LightlyTrain. lightly_train.train( out="out/coco_minitrain_pretrain", # Output directory. model="ultralytics/yolo11s.yaml", # Pass the YOLO model (use .yaml ending to start with random weights). data="coco_minitrain_25k/images", # Path to a directory with training images. epochs=100, # Adjust epochs for shorter training. batch_size=128, # Adjust batch size based on hardware. ) ``` ```` ````{tab} Command Line ```bash lightly-train --out=out/coco_minitrain_pretrain --model=ultralytics/yolo11s.yaml --data=coco_minitrain_25k/images --epochs=100 --batch-size=128 ``` ```` And just like that you pretrained a YOLO11s backbone! 🥳 This backbone can't solve any task yet, so in the next step we will finetune it on the PASCAL VOC dataset. ## Finetuning on PASCAL VOC Now that the pretrained model has been exported, we will further fine-tune the model on the task of object detection. The exported model already has exactly the format that Ultralytics' YOLO expects, so after getting the dataset ready, we can get started with only a few lines! ⚡️ In addition to fine-tuning the pretrained model we will also train a model that we initialize with random weights. This will let us compare the performance between the two, and show the great benefits of pretraining. ### Download the PASCAL VOC Dataset We can download the dataset directly using Ultralytics' API with the `check_det_dataset` function: ```python from ultralytics.data.utils import check_det_dataset dataset = check_det_dataset("VOC.yaml") ``` Ultralytics always downloads your datasets to a fixed location, which you can fetch via their `settings` module: ```python from ultralytics import settings print(settings["datasets_dir"]) ``` Inside that directory (), you will now have the following structure of images and labels: ```bash tree -d /VOC -I VOCdevkit > datasets/VOC > ├── images > │ ├── test2007 > │ ├── train2007 > │ ├── train2012 > │ ├── val2007 > │ └── val2012 > └── labels > ├── test2007 > ├── train2007 > ├── train2012 > ├── val2007 > └── val2012 ``` ### Inspect a few Images Let's use `supervision` and look at a few of the annotated samples to get a feeling of what the data looks like: ```python import random import matplotlib.pyplot as plt import supervision as sv import yaml from ultralytics import settings from ultralytics.data.utils import check_det_dataset dataset = check_det_dataset("VOC.yaml") detections = sv.DetectionDataset.from_yolo( data_yaml_path=dataset["yaml_file"], images_directory_path=f"{settings['datasets_dir']}/VOC/images/train2012", annotations_directory_path=f"{settings['datasets_dir']}/VOC/labels/train2012", ) with open(dataset["yaml_file"], "r") as f: data = yaml.safe_load(f) names = data["names"] box_annotator = sv.BoxAnnotator() label_annotator = sv.LabelAnnotator() fig, ax = plt.subplots(2, 2, figsize=(10, 10)) ax = ax.flatten() detections = [detections[random.randint(0, len(detections))] for _ in range(4)] for i, (path, image, annotation) in enumerate(detections): annotated_image = box_annotator.annotate(scene=image, detections=annotation) annotated_image = label_annotator.annotate( scene=annotated_image, detections=annotation, labels=[names[elem] for elem in annotation.class_id], ) ax[i].imshow(annotated_image[..., ::-1]) ax[i].axis("off") fig.tight_layout() fig.show() ``` ![VOC2012 Training Samples](samples_VOC_train2012.png) ### Finetuning the Pretrained Model All we have to do is to pass the path to the pretrained model to the `YOLO` class and the rest is the same as always with Ultralytics. ````{tab} Python ```python # finetune_yolo.py from ultralytics import YOLO if __name__ == "__main__": # Load the exported model. model = YOLO("out/coco_minitrain_pretrain/exported_models/exported_last.pt") # Fine-tune with ultralytics. model.train(data="VOC.yaml", epochs=30, project="logs/voc_yolo11s", name="from_pretrained") ``` ```` ````{tab} Command Line ```bash yolo detect train model="out/my_experiment/exported_models/exported_last.pt" data="VOC.yaml" epochs=30 project="logs/voc_yolo11s" name="from_pretrained" ``` ```` ### Finetuning the Randomly Initialized Model In order to quantify the influence of our pretraining, we also train a model from random weights, in Ultralytics this follows the `.yaml` name convention. ````{tab} Python ```python # finetune_scratch_yolo.py from ultralytics import YOLO if __name__ == "__main__": # Load the exported model. model = YOLO("yolo11s.yaml") # randomly initialized model # Fine-tune with ultralytics. model.train(data="VOC.yaml", epochs=30, project="logs/voc_yolo11s", name="from_scratch") ``` ```` ````{tab} Command Line ```bash yolo detect train model="yolo11s.yaml" data="VOC.yaml" epochs=30 project="logs/voc_yolo11s" name="from_scratch" ``` ```` ## Evaluating the Model Performance Congratulations, you made it almost to the end! 🎉 The last thing we'll do is to analyze the performance between the two. A very common metric to measure the performance of object detectors is the `mAP50-95` which we plot in the next cell, for both the pretrained model and the model that we trained from scratch. ```python import matplotlib.pyplot as plt import pandas as pd res_scratch = pd.read_csv("logs/voc_yolo11s/from_scratch/results.csv") res_finetune = pd.read_csv("logs/voc_yolo11s/from_pretrained/results.csv") fig, ax = plt.subplots() ax.plot(res_scratch["epoch"], res_scratch["metrics/mAP50-95(B)"], label="scratch") ax.plot(res_finetune["epoch"], res_finetune["metrics/mAP50-95(B)"], label="finetune") ax.set_xlabel("Epoch") ax.set_ylabel("mAP50-95") max_pretrained = res_finetune["metrics/mAP50-95(B)"].max() max_scratch = res_scratch["metrics/mAP50-95(B)"].max() ax.set_title( f"Pretraining is {(max_pretrained - max_scratch) / max_scratch * 100:.2f}% better than scratch" ) ax.legend() plt.show() ``` ![Pretraining vs Scratch](results_VOC.png) As clearly visible in the plot, the pretrained models converges much faster and achieves a significantly higher mAP50-95 than the model trained from scratch! ## Next Steps Congratulations, you've mastered the basics! 🎉 Ready to take it further? Here are some exciting next steps: - Go beyond distillation and explore other pretraining methods in LightlyTrain. Check [Methods](#methods) for more exciting possibilities! - Try your hand at different YOLO flavors (`YOLOv5`, `YOLOv6`, `YOLOv8`). - Take your pretrained model for a spin with {ref}`image embeddings ` and similarity search. Happy experimenting! 🚀