(semantic-segmentation)=

# Semantic Segmentation

```{note}
🔥 **New**: LightlyTrain now supports training DINOv2 models for semantic segmentation
with the `train_semantic_segmentation` function! The method is based on the
state-of-the-art segmentation model [EoMT](https://arxiv.org/abs/2503.19108) by
Kerssies et al. and reaches 58.4% mIoU on the ADE20K dataset.
```

Training a semantic segmentation model with LightlyTrain is straightforward and
only requires a few lines of code. The dataset must follow the [ADE20K format](https://ade20k.csail.mit.edu/)
with RGB images and integer masks in PNG format. See [data](#semantic-segmentation-data)
for more details.

````{tab} Python
```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov2/vitl14-eomt",
        data={
            "train": {
                "images": "my_data_dir/train/images",   # Path to training images
                "masks": "my_data_dir/train/masks",     # Path to training masks
            },
            "val": {
                "images": "my_data_dir/val/images",     # Path to validation images
                "masks": "my_data_dir/val/masks",       # Path to validation masks
            },
            "classes": {                                # Classes in the dataset                    
                0: "background",
                1: "car",
                2: "bicycle",
                # ...
            },
            # Optional, classes that are in the dataset but should be ignored during
            # training.
            "ignore_classes": [0], 
        },
    )
```
````

After the training completes you can load the model for inference like this:

```python
import lightly_train

model = lightly_train.load_model_from_checkpoint(
    "out/my_experiment/checkpoints/last.ckpt"
)
masks = model.predict("path/to/image.jpg")
```

And visualize the predicted masks like this:

```python
import torch
import matplotlib.pyplot as plt
from torchvision.io import read_image
from torchvision.utils import draw_segmentation_masks

image = read_image("path/to/image.jpg")
masks = torch.stack([masks == class_id for class_id in masks.unique()])
image_with_masks = draw_segmentation_masks(image, masks, alpha=0.6)
plt.imshow(image_with_masks.permute(1, 2, 0))
```

The predicted masks have shape `(height, width)` and each value corresponds to a class
ID as defined in the `classes` dictionary in the dataset.

(semantic-segmentation-output)=

## Out

The `out` argument specifies the output directory where all training logs, model exports,
and checkpoints are saved. It looks like this after training:

```text
out/my_experiment
├── checkpoints
│   └── last.ckpt                                       # Last checkpoint
├── events.out.tfevents.1721899772.host.1839736.0       # TensorBoard logs
└── train.log                                           # Training logs
```

The final model checkpoint is saved to `out/my_experiment/checkpoints/last.ckpt`.

```{tip}
Create a new output directory for each experiment to keep training logs, model exports,
and checkpoints organized.
```

(semantic-segmentation-data)=

## Data

Lightly**Train** supports training semantic segmentation models with images and masks.
Every image must have a corresponding mask with the same filename except for the file
extension. The masks must be PNG images in grayscale integer format, where each pixel
value corresponds to a class ID.

The following image formats are supported:

- jpg
- jpeg
- png
- ppm
- bmp
- pgm
- tif
- tiff
- webp

The following mask formats are supported:

- png

Example of a directory structure with training and validation images and masks:

```bash
my_data_dir
├── train
│   ├── images
│   │   ├── image0.jpg
│   │   └── image1.jpg
│   └── masks
│       ├── image0.png
│       └── image1.png
└── val
    ├── images
    |  ├── image2.jpg
    |  └── image3.jpg
    └── masks
       ├── image2.png
       └── image3.png
```

To train with this folder structure, set the `data` argument like this:

```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov2/vitl14-eomt",
        data={
            "train": {
                "images": "my_data_dir/train/images",   # Path to training images
                "masks": "my_data_dir/train/masks",     # Path to training masks
            },
            "val": {
                "images": "my_data_dir/val/images",     # Path to validation images
                "masks": "my_data_dir/val/masks",       # Path to validation masks
            },
            "classes": {                                # Classes in the dataset                    
                0: "background",
                1: "car",
                2: "bicycle",
                # ...
            },
            # Optional, classes that are in the dataset but should be ignored during
            # training.
            "ignore_classes": [0], 
        },
    )
```

The classes in the dataset must be specified in the `classes` dictionary. The keys
are the class IDs and the values are the class names. The class IDs must be identical to
the values in the mask images. All possible class IDs must be specified, otherwise
Lightly**Train** will raise an error if an unknown class ID is encountered. If you would
like to ignore some classes during training, you specify their class IDs in the
`ignore_classes` argument. The trained model will then not predict these classes.

(semantic-segmentation-model)=

## Model

The `model` argument defines the model used for semantic segmentation training. The
following models are available:

- `dinov2/vits14-eomt`
- `dinov2/vitb14-eomt`
- `dinov2/vitl14-eomt`
- `dinov2/vitg14-eomt`

All DINOv2 models are [pretrained by Meta](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models).

(semantic-segmentation-logging)=

## Logging

Logging is configured with the `logger_args` argument. The following loggers are
supported:

- [`mlflow`](#mlflow): Logs training metrics to MLflow (disabled by
  default, requires MLflow to be installed)
- [`tensorboard`](#tensorboard): Logs training metrics to TensorBoard (enabled by
  default, requires TensorBoard to be installed)

(semantic-segmentation-mlflow)=

### MLflow

```{important}
MLflow must be installed with `pip install "lightly-train[mlflow]"`.
```

The mlflow logger can be configured with the following arguments:

````{tab} Python
```python
import lightly_train

if __name__ == "__main__":
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov2/vitl14-eomt",
        data={
            # ...
        },
        logger_args={
            "mlflow": {
                "experiment_name": "my_experiment",
                "run_name": "my_run",
                "tracking_uri": "tracking_uri",
            },
        },
    )
```
````

(semantic-segmentation-tensorboard)=

### TensorBoard

TensorBoard logs are automatically saved to the output directory. Run TensorBoard in
a new terminal to visualize the training progress:

```bash
tensorboard --logdir out/my_experiment
```

Disable the TensorBoard logger with:

````{tab} Python
```python
logger_args={"tensorboard": None}
```
````

(semantic-segmentation-pretrain-finetune)=

## Pretrain and Fine-tune a Semantic Segmentation Model

To further improve the performance of your semantic segmentation model, you can first
pretrain a DINOv2 model on unlabeled data using self-supervised learning and then
fine-tune it on your segmentation dataset. This is especially useful if your dataset
is only partially labeled or if you have access to a large amount of unlabeled data.

The following example shows how to pretrain and fine-tune the model. Check out the page
on [DINOv2](#methods-dinov2) to learn more about pretraining DINOv2 models on unlabeled
data.

````{tab} Python
```python
import lightly_train

if __name__ == "__main__":
    # Pretrain a DINOv2 model.
    lightly_train.train(
        out="out/my_pretrain_experiment",
        data="my_pretrain_data_dir",
        model="dinov2/vitl14",
        method="dinov2",
        epochs=100, # We recommend epochs = 125000 * batch size // dataset size
    )

    # Fine-tune the DINOv2 model for semantic segmentation.
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov2/vitl14-eomt",
        model_args={
            # Path to your pretrained DINOv2 model.
            "backbone_weights": "out/my_pretrain_experiment/exported_models/exported_last.pt",
        },
        data={
            "train": {
                "images": "my_data_dir/train/images",   # Path to training images
                "masks": "my_data_dir/train/masks",     # Path to training masks
            },
            "val": {
                "images": "my_data_dir/val/images",     # Path to validation images
                "masks": "my_data_dir/val/masks",       # Path to validation masks
            },
            "classes": {                                # Classes in the dataset                    
                0: "background",
                1: "car",
                2: "bicycle",
                # ...
            },
            # Optional, classes that are in the dataset but should be ignored during
            # training.
            "ignore_classes": [0], 
        },
    )
```
````