Trainยถ

The train command is a simple interface to pretrain a large number of models using different SSL methods. An example command looks like this:

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        method="distillation",
        epochs=100,
        batch_size=128,
    )
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" method="distillation" epochs=100 batch_size=128

Important

The default pretraining method distillation is recommended, as it consistently outperforms others in extensive experiments. Batch sizes between 128 and 1536 strike a good balance between speed and performance. Moreover, long training runs, such as 2,000 epochs on COCO, significantly improve results. Check the Methods page for more details why distillation is the best choice.

This will pretrain a ResNet-50 model from TorchVision using images from my_data_dir and the DINOv2 distillation pretraining method. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment.

Tip

See lightly_train.train() for a complete list of available arguments.

Outยถ

The out argument specifies the output directory where all training logs, model exports, and checkpoints are saved. It looks like this after training:

out/my_experiment
โ”œโ”€โ”€ checkpoints
โ”‚   โ”œโ”€โ”€ epoch=99-step=123.ckpt                          # Intermediate checkpoint
โ”‚   โ””โ”€โ”€ last.ckpt                                       # Last checkpoint
โ”œโ”€โ”€ events.out.tfevents.1721899772.host.1839736.0       # TensorBoard logs
โ”œโ”€โ”€ exported_models
|   โ””โ”€โ”€ exported_last.pt                                # Final model exported
โ”œโ”€โ”€ metrics.jsonl                                       # Training metrics
โ””โ”€โ”€ train.log                                           # Training logs

The final model checkpoint is saved to out/my_experiment/checkpoints/last.ckpt. The file out/my_experiment/exported_models/exported_last.pt contains the final model, exported in the default format (package_default) of the used library (see export format for more details).

Tip

Create a new output directory for each experiment to keep training logs, model exports, and checkpoints organized.

Dataยถ

LightlyTrain expects a folder containing images or a list of (possibly mixed) folders and image files. Any folder will be recursively traversed and finds all image files within it (even in nested subdirectories).

The following image formats are supported:

  • jpg

  • jpeg

  • png

  • ppm

  • bmp

  • pgm

  • tif

  • tiff

  • webp

Example of passing a single folder my_data_dir:

my_data_dir
โ”œโ”€โ”€ dir0
โ”‚   โ”œโ”€โ”€ image0.jpg
โ”‚   โ””โ”€โ”€ image1.jpg
โ””โ”€โ”€ dir1
    โ””โ”€โ”€ image0.jpg
lightly_train.train(
    out="out/my_experiment",            # Output directory
    data="my_data_dir",                 # Directory with images
    model="torchvision/resnet18",       # Model to train
)
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet18"

Example of passing a (mixed) list of files and folders:

โ”œโ”€โ”€ image2.jpg
โ”œโ”€โ”€ image3.jpg
โ””โ”€โ”€ my_data_dir
    โ”œโ”€โ”€ dir0
    โ”‚   โ”œโ”€โ”€ image0.jpg
    โ”‚   โ””โ”€โ”€ image1.jpg
    โ””โ”€โ”€ dir1
        โ””โ”€โ”€ image0.jpg
lightly_train.train(
    out="out/my_experiment",            # Output directory
    data=["image2.jpg", "image3.jpg", "my_data_dir"],                 # Directory with images
    model="torchvision/resnet18",       # Model to train
)
lightly-train train out="out/my_experiment" data='["image2.jpg", "image3.jpg", "my_data_dir"]' model="torchvision/resnet18"

Modelยถ

See supported libraries in the Models page for a detailed list of all supported libraries and their respective docs pages for all supported models.

Methodยถ

See Methods for a list of all supported methods.

Loggersยถ

Logging is configured with the loggers argument. The following loggers are supported:

  • jsonl: Logs training metrics to a .jsonl file (enabled by default)

  • mlflow: Logs training metrics to MLflow (disabled by default, requires MLflow to be installed)

  • tensorboard: Logs training metrics to TensorBoard (enabled by default, requires TensorBoard to be installed)

  • wandb: Logs training metrics to Weights & Biases (disabled by default, requires Weights & Biases to be installed)

JSONLยถ

The JSONL logger is enabled by default and logs training metrics to a .jsonl file at out/my_experiment/metrics.jsonl.

Disable the JSONL logger with:

loggers={"jsonl": None}
loggers.jsonl=null

MLflowยถ

Important

MLflow must be installed with pip install "lightly-train[mlflow]".

The mlflow logger can be configured with the following arguments:

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        loggers={
            "mlflow": {
                "experiment_name": "my_experiment",
                "run_name": "my_run",
                "tracking_uri": "tracking_uri",
                # "run_id": "my_run_id",  # Use if resuming a training with resume=True
                # "log_model": True,      # Currently not supported
            },
        },
    )
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" loggers.mlflow.experiment_name="my_experiment" loggers.mlflow.run_name="my_run" loggers.mlflow.tracking_uri=tracking_uri

TensorBoardยถ

TensorBoard logs are automatically saved to the output directory. Run TensorBoard in a new terminal to visualize the training progress:

tensorboard --logdir out/my_experiment

Disable the TensorBoard logger with:

loggers={"tensorboard": None}
loggers.tensorboard=null

Weights & Biasesยถ

Important

Weights & Biases must be installed with pip install "lightly-train[wandb]".

The Weights & Biases logger can be configured with the following arguments:

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",
        data="my_data_dir",
        model="torchvision/resnet50",
        loggers={
            "wandb": {
                "project": "my_project",
                "name": "my_experiment",
                "log_model": False,              # Set to True to upload model checkpoints
            },
        },
    )
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" loggers.wandb.project="my_project" loggers.wandb.name="my_experiment" loggers.wandb.log_model=False

More configuration options are available through the Weights & Biases environment variables. See the Weights & Biases documentation for more information.

Disable the Weights & Biases logger with:

loggers={"wandb": None}
loggers.wandb=null

Resume Trainingยถ

There are two distinct ways to continue training, depending on your intention.

Resume Interrupted Trainingยถ

Use resume=True to resume a previously interrupted or crashed training run. This will pick up exactly where the training left off.

  • You must use the same output_dir as the original run.

  • You must not change any training parameters (e.g., learning rate, batch size, data, etc.).

  • This is intended for continuing the same run without modification.

Load Weights for a New Runยถ

Use checkpoint="path/to/checkpoint.ckpt" to load model weights from a checkpoint, but start a new training run.

  • You are free to change training parameters.

  • This is useful for continuing training with a different setup.

General Notesยถ

Important

  • resume=True and checkpoint=... are mutually exclusive and cannot be used together.

  • If overwrite=True is set, training will start fresh, overwriting existing outputs or checkpoints in the specified output directory.

Advanced Optionsยถ

Input Image Resolutionยถ

The input image resolution can be set with the transform_args argument. By default a resolution of 224x224 pixels is used. A custom resolution can be set like this:

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",            # Output directory
        data="my_data_dir",                 # Directory with images
        model="torchvision/resnet18",       # Model to train
        transform_args={"image_size": (448, 448)}, # (height, width)
    )
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet18" transform_args.image_size="[448,448]"

Warning

Not all models support all image sizes.

Image Transformsยถ

See Configuring Image Transforms on how to configure image transformations.

Method Argumentsยถ

Warning

In 99% of cases, it is not necessary to modify the default method arguments in LightlyTrain. The default settings are carefully tuned to work well for most use cases.

The method arguments can be set with the method_args argument:

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",            # Output directory
        data="my_data_dir",                 # Directory with images
        model="torchvision/resnet18",       # Model to train
        method="distillation",              # Pretraining method
        method_args={                       # Override the default teacher model
            "teacher": "dinov2_vit/vitl14_pretrain",
        },
    )
lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet18" method="distillation" method_args.teacher="dinov2_vit/vitl14_pretrain"

Each pretraining method has its own set of arguments that can be configured. LightlyTrain provides sensible defaults that are adjusted depending on the dataset and model used. The defaults for each method are listed in the respective Methods documentation pages.

Performance Optimizationsยถ

For performance optimizations, e.g. using accelerators, multi-GPU, multi-node, and half precision training, see the performance page.