Train¶

The train command is a simple interface to pretrain a large number of models using different SSL methods. An example command looks like this:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        method="distillation",
        epochs=100,
        batch_size=128,
    )

Command Line

lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" method="distillation" epochs=100 batch_size=128

Important

The default pretraining method distillation is recommended, as it consistently outperforms others in extensive experiments. Batch sizes between 128 and 1536 strike a good balance between speed and performance. Moreover, long training runs, such as 2,000 epochs on COCO, significantly improve results. Check the Methods page for more details why distillation is the best choice.

This will pretrain a ResNet-50 model from TorchVision using images from my_data_dir and the DINOv2 distillation pretraining method. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment.

Tip

See lightly_train.train() for a complete list of available arguments.

Out¶

The out argument specifies the output directory where all training logs, model exports, and checkpoints are saved. It looks like this after training:

out/my_experiment
├── checkpoints
│   ├── epoch=99-step=123.ckpt                          # Intermediate checkpoint
│   └── last.ckpt                                       # Last checkpoint
├── events.out.tfevents.1721899772.host.1839736.0       # TensorBoard logs
├── exported_models
|   └── exported_last.pt                                # Final model exported
├── metrics.jsonl                                       # Training metrics
└── train.log                                           # Training logs

The final model checkpoint is saved to out/my_experiment/checkpoints/last.ckpt. The file out/my_experiment/exported_models/exported_last.pt contains the final model, exported in the default format (package_default) of the used library (see export format for more details).

Tip

Create a new output directory for each experiment to keep training logs, model exports, and checkpoints organized.

Data¶

The data directory data="my_data_dir" can have any structure, including nested subdirectories. LightlyTrain finds all images in the directory recursively.

The following image formats are supported:

jpg
jpeg
png
ppm
bmp
pgm
tif
tiff
webp

Model¶

See supported libraries in the Models page for a detailed list of all supported libraries and their respective docs pages for all supported models.

Method¶

See Methods for a list of all supported methods.

Loggers¶

Logging is configured with the loggers argument. The following loggers are supported:

jsonl: Logs training metrics to a .jsonl file (enabled by default)
tensorboard: Logs training metrics to TensorBoard (enabled by default, requires TensorBoard to be installed)
wandb: Logs training metrics to Weights & Biases (disabled by default, requires Weights & Biases to be installed)

JSONL¶

The JSONL logger is enabled by default and logs training metrics to a .jsonl file at out/my_experiment/metrics.jsonl.

Disable the JSONL logger with:

Python

loggers={"jsonl": None}

Command Line

loggers.jsonl=null

TensorBoard¶

TensorBoard logs are automatically saved to the output directory. Run TensorBoard in a new terminal to visualize the training progress:

tensorboard --logdir out/my_experiment

Disable the TensorBoard logger with:

Python

loggers={"tensorboard": None}

Command Line

loggers.tensorboard=null

Weights & Biases¶

Important

Weights & Biases must be installed with pip install "lightly-train[wandb]".

The Weights & Biases logger can be configured with the following arguments:

Python

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        loggers={
            "wandb": {
                "project": "my_project",
                "name": "my_experiment",
                "log_model": False,              # Set to True to upload model checkpoints
            },
        },
    )

Command Line

lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" loggers.wandb.project="my_project" loggers.wandb.name="my_experiment" loggers.wandb.log_model=False

More configuration options are available through the Weights & Biases environment variables. See the Weights & Biases documentation for more information.

Disable the Weights & Biases logger with:

Python

loggers={"wandb": None}

Command Line

loggers.wandb=null

Advanced Options¶

Input Image Resolution¶

The input image resolution can be set with the transform_args argument. By default a resolution of 224x224 pixels is used. A custom resolution can be set like this:

Python

transform_args = {"image_size": (448, 448)} # (height, width)

Command Line

transform_args.image_size="\[448,448\]"  # (height, width)

Warning

Not all models support all image sizes.

Performance Optimizations¶

For performance optimizations, e.g. using accelerators, multi-GPU, multi-node, and half precision training, see the performance page.