Train¶

The train command is a simple interface to pretrain a large number of models using different SSL methods. An example command looks like this:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        method="distillation",
        epochs=100,
        batch_size=128,
    )

Command Line

lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" method="distillation" epochs=100 batch_size=128

Important

The default pretraining method distillation is recommended, as it consistently outperforms others in extensive experiments. Batch sizes between 128 and 1536 strike a good balance between speed and performance. Moreover, long training runs, such as 2,000 epochs on COCO, significantly improve results.

This will pretrain a ResNet-50 model from TorchVision using images from my_data_dir and the DINOv2 distillation pretraining method. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment.

Tip

See lightly_train.train() for a complete list of available arguments.

Out¶

The out argument specifies the output directory where all training logs, model exports, and checkpoints are saved. It looks like this after training:

out/my_experiment
├── checkpoints
│   ├── epoch=99-step=123.ckpt                          # Intermediate checkpoint
│   └── last.ckpt                                       # Last checkpoint
├── events.out.tfevents.1721899772.host.1839736.0       # TensorBoard logs
├── exported_models
|   └── exported_last.pt                                # Final model exported
├── metrics.jsonl                                       # Training metrics
└── train.log                                           # Training logs

The final model checkpoint is saved to out/my_experiment/checkpoints/last.ckpt. The file out/my_experiment/exported_models/exported_last.pt contains the final model, exported in the default format (package_default) of the used library (see export format for more details).

Tip

Create a new output directory for each experiment to keep training logs, model exports, and checkpoints organized.

Data¶

The data directory data="my_data_dir" can have any structure, including nested subdirectories. LightlyTrain finds all images in the directory recursively.

The following image formats are supported:

jpg
jpeg
png
ppm
bmp
pgm
tif
tiff
webp

Model¶

See supported libraries in the Models page for a detailed list of all supported libraries and their respective docs pages for all supported models.

Method¶

See Methods for a list of all supported methods.

Loggers¶

Logging is configured with the loggers argument. The following loggers are supported:

jsonl: Logs training metrics to a .jsonl file (enabled by default)
tensorboard: Logs training metrics to TensorBoard (enabled by default, requires TensorBoard to be installed)
wandb: Logs training metrics to Weights & Biases (disabled by default, requires Weights & Biases to be installed)

JSONL¶

The JSONL logger is enabled by default and logs training metrics to a .jsonl file at out/my_experiment/metrics.jsonl.

Disable the JSONL logger with:

Python

loggers={"jsonl": None}

Command Line

loggers.jsonl=null

TensorBoard¶

TensorBoard logs are automatically saved to the output directory. Run TensorBoard in a new terminal to visualize the training progress:

tensorboard --logdir out/my_experiment

Disable the TensorBoard logger with:

Python

loggers={"tensorboard": None}

Command Line

loggers.tensorboard=null

Weights & Biases¶

Important

Weights & Biases must be installed with pip install "lightly-train[wandb]".

The Weights & Biases logger can be configured with the following arguments:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        loggers={
            "wandb": {
                "project": "my_project",
                "name": "my_experiment",
                "log_model": False,              # Set to True to upload model checkpoints
            },
        },
    )

Command Line

lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" loggers.wandb.project="my_project" loggers.wandb.name="my_experiment" loggers.wandb.log_model=False

More configuration options are available through the Weights & Biases environment variables. See the Weights & Biases documentation for more information.

Disable the Weights & Biases logger with:

Python

loggers={"wandb": None}

Command Line

loggers.wandb=null

Advanced Options¶

Input Image Resolution¶

The input image resolution can be set with the transform_args argument. By default a resolution of 224x224 pixels is used. A custom resolution can be set like this:

Python

transform_args = {"image_size": (448, 448)} # (height, width)

Command Line

transform_args.image_size="\[448,448\]"  # (height, width)

Warning

Not all models support all image sizes.

Performance Optimizations¶

For performance optimizations, e.g. using accelerators, multi-GPU, multi-node, and half precision training, see the performance page.