(quick-start-object-detection)=

# Quick Start - Object Detection

```{image} https://colab.research.google.com/assets/colab-badge.svg
:target: https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/object_detection.ipynb
```

This guide demonstrates how to use Lightly**Train** for object detection with our
state-of-the-art LTDETR model built on [DINOv3](https://github.com/facebookresearch/dinov3).

## Installation

```bash
pip install lightly-train
```

```{important}
Lightly**Train** is officially supported on:
- Linux: CPU or CUDA
- MacOS: CPU only
- Windows (experimental): CPU or CUDA

We are planning to support MPS for MacOS.

Check the [installation instructions](installation.md#installation) for more details.
```

## Prediction using Lightly**Train**'s model weights

### Download an example image

Download an example image for inference:

```bash
wget -O image.jpg http://images.cocodataset.org/val2017/000000577932.jpg
```

### Load the model weights

Load the model with Lightly**Train**'s `load_model` function. This will automatically
download the model weights and load the model:

```python
import lightly_train

model = lightly_train.load_model("dinov3/convnext-tiny-ltdetr-coco")
```

### Predict the objects

Run `model.predict` on the image. The method accepts file paths, URLs, PIL Images, or
tensors as input:

```python skip_ruff
results = model.predict("image.jpg")
results["labels"]   # Class labels, tensor of shape (num_boxes,)
results["bboxes"]   # Bounding boxes in (xmin, ymin, xmax, ymax) absolute pixel
                    # coordinates of the original image. Tensor of shape (num_boxes, 4).
results["scores"]   # Confidence scores, tensor of shape (num_boxes,)
```

### Visualize the results

Visualize the image and results to check what objects were detected:

```python skip_ruff
import matplotlib.pyplot as plt
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes

image = read_image("image.jpg")
image_with_boxes = draw_bounding_boxes(
    image,
    boxes=results["bboxes"],
    labels=[model.classes[label.item()] for label in results["labels"]],
)
plt.imshow(image_with_boxes.permute(1, 2, 0))
plt.show()
```

```{figure} /_static/images/object_detection/street.jpg
```

## Train object detection model

Training your own detection model is straightforward with Lightly**Train**.

### Download dataset

First download a dataset. The dataset must be in YOLO format, see the
[documentation](object-detection-data) for more details. You can use
[labelformat](https://github.com/lightly-ai/labelformat) to convert any dataset to the
YOLO format:

```bash
wget https://github.com/lightly-ai/coco128_yolo/releases/download/v0.0.1/coco128_yolo.zip && unzip -q coco128_yolo.zip
```

The dataset looks like this after the download completes:

```text
coco128_yolo
├── images
│   ├── train2017
│   │   ├── 000000000009.jpg
│   │   ├── 000000000025.jpg
│   │   ├── ...
│   │   └── 000000000650.jpg
│   └── val2017
│       ├── 000000000139.jpg
│       ├── 000000000285.jpg
│       ├── ...
│       └── 000000013201.jpg
└── labels
    ├── train2017
    │   ├── 000000000009.txt
    │   ├── 000000000025.txt
    │   ├── ...
    │   └── 000000000659.txt
    └── val2017
        ├── 000000000139.txt
        ├── 000000000285.txt
        ├── ...
        └── 000000013201.txt
```

### Start training

Start the training with the `train_object_detection` function. You only have to specify
the output directory, model, and input data. Lightly**Train** automatically sets the
remaining training parameters and applies image augmentations. Of course you can always
customize these settings if needed:

```python
import lightly_train

lightly_train.train_object_detection(
    out="out/my_experiment",
    model="dinov3/convnext-tiny-ltdetr-coco",
    steps=100,  # Small number of steps for demonstration, default is 90_000.
    batch_size=4,  # Small batch size for demonstration, default is 16.
    data={
        "path": "coco128_yolo",
        "train": "images/train2017",
        "val": "images/val2017",
        "names": {
            0: "person",
            1: "bicycle",
            2: "car",
            3: "motorcycle",
            4: "airplane",
            5: "bus",
            6: "train",
            7: "truck",
            8: "boat",
            9: "traffic light",
            10: "fire hydrant",
            11: "stop sign",
            12: "parking meter",
            13: "bench",
            14: "bird",
            15: "cat",
            16: "dog",
            17: "horse",
            18: "sheep",
            19: "cow",
            20: "elephant",
            21: "bear",
            22: "zebra",
            23: "giraffe",
            24: "backpack",
            25: "umbrella",
            26: "handbag",
            27: "tie",
            28: "suitcase",
            29: "frisbee",
            30: "skis",
            31: "snowboard",
            32: "sports ball",
            33: "kite",
            34: "baseball bat",
            35: "baseball glove",
            36: "skateboard",
            37: "surfboard",
            38: "tennis racket",
            39: "bottle",
            40: "wine glass",
            41: "cup",
            42: "fork",
            43: "knife",
            44: "spoon",
            45: "bowl",
            46: "banana",
            47: "apple",
            48: "sandwich",
            49: "orange",
            50: "broccoli",
            51: "carrot",
            52: "hot dog",
            53: "pizza",
            54: "donut",
            55: "cake",
            56: "chair",
            57: "couch",
            58: "potted plant",
            59: "bed",
            60: "dining table",
            61: "toilet",
            62: "tv",
            63: "laptop",
            64: "mouse",
            65: "remote",
            66: "keyboard",
            67: "cell phone",
            68: "microwave",
            69: "oven",
            70: "toaster",
            71: "sink",
            72: "refrigerator",
            73: "book",
            74: "clock",
            75: "vase",
            76: "scissors",
            77: "teddy bear",
            78: "hair drier",
            79: "toothbrush",
        },
    },
)
```

Once the training is complete, the output directory looks like this:

```text
out/my_experiment
├── checkpoints
│   ├── best.ckpt
│   └── last.ckpt
├── events.out.tfevents.1764251158.ef9b159fe4b8.273.0
├── exported_models
│   ├── exported_best.pt
│   └── exported_last.pt
└── train.log
```

### Load trained model

The best model checkpoint is saved to
`out/my_experiment/exported_models/exported_best.pt`. You can load it for inference like
this:

```python skip_ruff
# Load the model for inference
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")

# Run inference
results = model.predict("image.jpg")

# Plot results
image = read_image("image.jpg")
image_with_boxes = draw_bounding_boxes(
    image,
    boxes=results["bboxes"],
    labels=[model.classes[label.item()] for label in results["labels"]],
)
plt.imshow(image_with_boxes.permute(1, 2, 0))
plt.show()
```

```{figure} /_static/images/object_detection/street.jpg
```

## Next Steps

- [Object Detection Documentation](object-detection): If you want to learn more about
  object detection with Lightly**Train**.
- [Distillation Quick Start](quick-start-distillation): If you want to learn how to
  pretrain/distill models with unlabeled data.
- [DINOv2 Pretraining](methods-dinov2): If you want to learn how to pretrain
  foundation models with unlabeled data.