(image-classification)= # Image Classification [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/image_classification.ipynb) ```{note} LightlyTrain supports training image classification models using any backbone. Both multiclass and multilabel classification are supported. ``` (image-classification-train)= ## Train an Image Classification Model Training an image classification model with LightlyTrain is straightforward and only requires a few lines of code. See [data](#image-classification-data) for more details on how to prepare your dataset. (image-classification-multiclass-train)= ### Multiclass Classification In multiclass classification, each image is assigned to exactly one class. This is the default mode and does not need to be specified explicitly. ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="dinov3/vitt16", data={ "train": "my_data_dir/train/", "val": "my_data_dir/val/", "classes": { 0: "cat", 1: "car", 2: "dog", # ... }, }, ) ``` During training, both the - best (with highest validation top-1 accuracy) and - last (last validation round as determined by `save_checkpoint_args.save_every_num_steps`) model weights are exported to `out/my_experiment/exported_models/`, unless disabled in `save_checkpoint_args`. You can use these weights to continue fine-tuning on another task by loading the weights via the `model` parameter: ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="out/my_experiment/exported_models/exported_best.pt", # Continue training from the best model data={...}, ) ``` (image-classification-inference)= ### Load the Trained Model from Checkpoint and Predict After the training completes, you can load the best model checkpoint for inference like this: ```python import lightly_train model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt") results = model.predict("path/to/image.jpg", topk=1, threshold=0.5) results["labels"] # Class labels, tensor of shape (topk,) results["scores"] # Confidence scores, tensor of shape (topk,) ``` The predicted label is the class ID as defined in the `classes` dictionary in the dataset configuration. (image-classification-multilabel-train)= ## Multilabel Classification In multilabel classification, each image can be assigned to multiple classes simultaneously. To enable multilabel classification, set `classification_task` to `"multilabel"`. This requires a CSV-based dataset where multiple labels per image can be specified. See [CSV-based Datasets](#csv-based-datasets-single-label-and-multi-label) for details on the data format. ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="dinov3/vitt16", classification_task="multilabel", data={ "train_csv": "my_data_dir/train.csv", "val_csv": "my_data_dir/val.csv", "classes": { 0: "cat", 1: "car", 2: "dog", # ... }, }, ) ``` For multilabel classification, the model is saved based on the best validation F1 score. When running inference, the model returns all labels with a confidence score above 0.5 by default: ```python import lightly_train model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt") results = model.predict("path/to/image.jpg") results["labels"] # Class labels, tensor of shape (num_labels,) results["scores"] # Confidence scores, tensor of shape (num_labels,) ``` (image-classification-output)= ## Out The `out` argument specifies the output directory where all training logs, model exports, and checkpoints are saved. It looks like this after training: ```text out/my_experiment ├── checkpoints │ └── last.ckpt # Last checkpoint ├── exported_models │ └── exported_last.pt # Last model exported (unless disabled) │ └── exported_best.pt # Best model exported (unless disabled) ├── events.out.tfevents.1721899772.host.1839736.0 # TensorBoard logs └── train.log # Training logs ``` The final model checkpoint is saved to `out/my_experiment/checkpoints/last.ckpt`. The last and best model weights are exported to `out/my_experiment/exported_models/` unless disabled in `save_checkpoint_args`. ```{tip} Create a new output directory for each experiment to keep training logs, model exports, and checkpoints organized. ``` (image-classification-data)= ## Data LightlyTrain supports training image classification models using either a directory-based dataset structure or CSV annotation files. Both single-label and multi-label classification are supported. ### Image Formats The following image formats are supported: - jpg - jpeg - png - ppm - bmp - pgm - tif - tiff - webp - dcm (DICOM) ### Folder-based Datasets (Single-label) In the simplest setup, images are organized into subdirectories, where each subdirectory corresponds to one class. The directory name defines the class name, and all images inside that directory are assigned to that class. Your dataset directory should be organized like this: ```text my_data_dir/ ├── train │ ├── cat │ │ ├── img1.jpg │ │ ├── img2.jpg │ │ └── ... │ ├── car │ │ ├── img1.jpg │ │ ├── img2.jpg │ │ └── ... │ └── ... └── val ├── cat │ ├── img1.jpg │ └── ... ├── car │ ├── img1.jpg │ └── ... └── ... ``` To train with this directory structure, set the `data` argument like this: ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="dinov3/vitt16", data={ "train": "my_data_dir/train/", "val": "my_data_dir/val/", "classes": { 0: "cat", 1: "car", 2: "dog", # ... }, # Optional, classes that are in the dataset but should be ignored during # training. "ignore_classes": [0], }, ) ``` In this setup: - Each image belongs to exactly one class. - Class names are taken from the directory names. - Class IDs are assigned according to the passed `classes` dictionary. ### CSV-based Datasets (Single-label and Multi-label) For more flexibility, LightlyTrain also supports CSV files that explicitly map image paths to labels. This is required for multi-label classification, and can also be used for single-label datasets. Each split (train, val, optionally test) must have its own CSV file. #### CSV format A CSV file must contain: - one column specifying the image path - one column specifying the label(s) The image path can be absolute or relative to the CSV file location. For example, given the following dataset layout: ```text my_data_dir/ ├── train │ ├── cat │ │ ├── img1.jpg │ │ ├── img2.jpg │ │ └── ... │ ├── car │ │ ├── img1.jpg │ │ ├── img2.jpg │ │ └── ... │ └── ... ├── val │ ├── cat │ │ ├── img1.jpg │ │ └── ... │ ├── car │ │ ├── img1.jpg │ │ └── ... │ └── ... ├── train.csv └── val.csv ``` A corresponding `train.csv` with class names could look like this: ``` image_path,label train/cat/img1.jpg,"cat" train/cat/img2.jpg,"cat,dog" train/car/img1.jpg,"car" ``` and with class IDs: ``` train/cat/img1.jpg,"0" train/cat/img2.jpg,"0,2" train/car/img1.jpg,"1" ``` In this case, the image paths are interpreted relative to the directory containing the CSV file, i.e., `my_data_dir/`. To train with this CSV-based structure, set the `data` argument like this: ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="dinov3/vitt16", data={ "train_csv": "my_data_dir/train.csv", "val_csv": "my_data_dir/val.csv", "classes": { 0: "cat", 1: "car", 2: "dog", # ... }, # Optional, classes that are in the dataset but should be ignored during # training. "ignore_classes": [0], }, ) ``` Notes: - Image paths must either be absolute or relative to the directory containing the CSV file. - Multiple labels are separated by a delimiter (default: `","`). - When using commas as label delimiters, the label field must be quoted. - Labels can be specified either as class IDs or class names. #### Supported CSV Options The behavior of CSV parsing can be configured via the `data` argument: ```python import lightly_train if __name__ == "__main__": lightly_train.train_image_classification( out="out/my_experiment", model="dinov3/vitt16", data={ "train_csv": "my_data_dir/train.csv", "val_csv": "my_data_dir/val.csv", "classes": { 0: "cat", 1: "car", 2: "dog", # ... }, # Optional, classes that are in the dataset but should be ignored during # training. "ignore_classes": [0], # Extra arguments for CSV-based datasets. "csv_image_column": "image_path", # Name of the column storing image paths. "csv_label_column": "label", # Name of the column storing labels. "csv_label_type": "name", # Type of labels either "name" or "id". "label_delimiter": ",", # Delimiter used to separate the labels. }, ) ``` (image-classification-model)= ## Model The `model` argument defines the backbone model used for image classification. All LightlyTrain models are supported as backbones. For example: - `dinov3/vitt16` - `dinov2/vitb14` - `timm/resnet18` - `torchvision/resnet50` See [Models](pretrain_distill/models/index.md) for a full list of supported model backbones. ## Training Settings See [](train-settings) on how to configure training settings. (image-classification-logging)= (image-classification-mlflow)= (image-classification-tensorboard)= (image-classification-wandb)= ## Logging See [](train-settings-logging) on how to configure logging. (image-classification-resume-training)= ## Resume Training See [](train-settings-resume-training) on how to resume training. (image-classification-transform-args)= ## Default Image Transform Arguments The following are the default image transform arguments. See [](train-settings-transforms) on how to customize transform settings. `````{dropdown} Image Classification Default Transform Arguments ````{dropdown} Train ```{include} _auto/imageclassificationtrain_train_transform_args.md ``` ```` ````{dropdown} Val ```{include} _auto/imageclassificationtrain_val_transform_args.md ``` ```` ````` (image-classification-onnx)= ## Exporting a Checkpoint to ONNX [Open Neural Network Exchange (ONNX)](https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange) is a standard format for representing machine learning models in a framework independent manner. In particular, it is useful for deploying our models on edge devices where PyTorch is not available. ### Requirements Exporting to ONNX requires some additional packages to be installed. Namely - [onnx](https://pypi.org/project/onnx/) - [onnxruntime](https://pypi.org/project/onnxruntime/) if `verify` is set to `True`. - [onnxslim](https://pypi.org/project/onnxslim/) if `simplify` is set to `True`. You can install them with: ```bash pip install "lightly-train[onnx,onnxruntime,onnxslim]" ``` The following example shows how to export a previously trained model to ONNX. ```python import lightly_train # Instantiate the model from a checkpoint. model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt") # Export to ONNX. model.export_onnx( out="out/my_experiment/exported_models/model.onnx", # precision="fp16", # Export model with FP16 weights for smaller size and faster inference. ) ``` See {py:meth}`~.ImageClassification.export_onnx` for all available options when exporting to ONNX. (image-classification-tensorrt)= ## Exporting a Checkpoint to TensorRT TensorRT engines are built from an ONNX representation of the model. The `export_tensorrt` method internally exports the model to ONNX (see the ONNX export section above) before building a [TensorRT](https://developer.nvidia.com/tensorrt) engine for fast GPU inference. ### Requirements TensorRT is not part of LightlyTrain's dependencies and must be installed separately. Installation depends on your OS, Python version, GPU, and NVIDIA driver/CUDA setup. See the [TensorRT documentation](https://docs.nvidia.com/deeplearning/tensorrt/latest/installing-tensorrt/installing.html) for more details. On CUDA 12.x systems you can often install the Python package via: ```bash pip install tensorrt-cu12 ``` ```python import lightly_train # Instantiate the model from a checkpoint. model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt") # Export to TensorRT from an ONNX file. model.export_tensorrt( out="out/my_experiment/exported_models/model.trt", # TensorRT engine destination. # precision="fp16", # Export model with FP16 weights for smaller size and faster inference. ) ``` See {py:meth}`~.ImageClassification.export_tensorrt` for all available options when exporting to TensorRT.