lightly_train

Documentation of the public API of the lightly_train package.

Functions

lightly_train.embed(*, out: str | Path, data: str | Path | Sequence[str | Path], checkpoint: str | Path, format: str | EmbeddingFormat = 'torch', image_size: int | tuple[int, int] = (224, 224), batch_size: int = 128, num_workers: int | Literal['auto'] = 'auto', accelerator: str | Accelerator = 'auto', overwrite: bool = False, precision: Literal[64, 32, 16, 'transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true', '64', '32', '16', 'bf16'] = '32-true') None

Embed images from a model checkpoint.

See the documentation for more information: https://docs.lightly.ai/train/stable/embed.html

Args:
out:

Filepath where the embeddings will be saved. For example “embeddings.csv”.

data:

Directory containing the images to embed or a sequence of image directories and files.

checkpoint:

Path to the LightlyTrain checkpoint file used for embedding. The location of the checkpoint depends on the train command. If training was run with out="out/my_experiment", then the last LightlyTrain checkpoint is saved to out/my_experiment/checkpoints/last.ckpt.

format:

Format of the embeddings. Supported formats are [‘csv’, ‘lightly_csv’, ‘torch’]. ‘torch’ is the recommended and most efficient format. Torch embeddings can be loaded with torch.load(out, weigths_only=True). Choose ‘lightly_csv’ if you want to use the embeddings as custom embeddings with the Lightly Worker.

image_size:

Size to which the images are resized before embedding. If a single integer is provided, the image is resized to a square with the given side length. If a (height, width) tuple is provided, the image is resized to the given height and width. Note that not all models support all image sizes.

batch_size:

Number of images per batch.

num_workers:

Number of workers for the dataloader. ‘auto’ automatically sets the number of workers based on the available CPU cores.

accelerator:

Hardware accelerator. Can be one of [‘cpu’, ‘gpu’, ‘tpu’, ‘ipu’, ‘hpu’, ‘mps’, ‘auto’]. ‘auto’ will automatically select the best accelerator available.

overwrite:

Overwrite the output file if it already exists.

precision:

Embedding precision. Select ‘32-true’ for full 32-bit precision, or ‘bf16-mixed’/’16-mixed’ for mixed precision.

lightly_train.export(*, out: str | Path, checkpoint: str | Path, part: str | ModelPart = 'model', format: str | ModelFormat = 'package_default', overwrite: bool = False) None

Export a model from a checkpoint.

See the documentation for more information: https://docs.lightly.ai/train/stable/export.html

Args:
out:

Path where the exported model will be saved.

checkpoint:

Path to the LightlyTrain checkpoint file to export the model from. The location of the checkpoint depends on the train command. If training was run with out="out/my_experiment", then the last LightlyTrain checkpoint is saved to out/my_experiment/checkpoints/last.ckpt.

part:

Part of the model to export. Valid options are ‘model’ and ‘embedding_model’. ‘model’ is the default option and exports the model that was passed as model argument to the train function. ‘embedding_model’ exports the embedding model. This includes the model passed with the model argument in the train function and an extra embedding layer if the embed_dim argument was set during training. This is useful if you want to use the exported model for embedding images.

format:

Format to save the model in. Valid options are [‘package_default’, ‘torch_model’, ‘torch_state_dict’]. ‘package_default’ is the default option and exports the model in the default format of the package that was used for training. This ensures compatibility with the package and is the most flexible option. ‘torch_state_dict’ exports the model’s state dict which can be loaded with model.load_state_dict(torch.load(out, weights_only=True)). ‘torch_model’ exports the model as a torch module which can be loaded with model = torch.load(out). This requires that the same LightlyTrain version is installed when the model is exported and when it is loaded again.

overwrite:

Overwrite the output file if it already exists.

lightly_train.export_onnx(*, out: str | Path, checkpoint: str | Path, batch_size: int = 1, height: int | None = None, width: int | None = None, precision: Literal['32-true', '16-true'] = '32-true', simplify: bool = True, verify: bool = True, overwrite: bool = False, format_args: dict[str, Any] | None = None) None

Export a model as ONNX from a checkpoint.

Args:
out:

Path where the exported model will be saved.

checkpoint:

Path to the LightlyTrain checkpoint file to export the model from.

batch_size:

Batch size of the input tensor.

height:

Height of the input tensor.

width:

Width of the input tensor.

precision:

“32-true” for float32 precision or “16-true” for float16 precision. Choosing “16-true” can lead to less memory consumption and faster inference times on GPUs but might lead to slightly more inaccuracies. Default is “32-true”.

simplify:

Simplify the ONNX model with onnxslim after the export. Default is True.

verify:

Check the exported model for errors. With recommend to enable this.

overwrite:

Overwrite the output file if it already exists.

format_args:

Arguments that are passed to torch.onnx.export. Only use this if you know what you are doing.

lightly_train.list_methods() list[str]

Lists all available self-supervised learning methods.

See the documentation for more information: https://docs.lightly.ai/train/stable/methods/

lightly_train.list_models() list[str]

Lists all models in <package_name>/<model_name> format.

See the documentation for more information: https://docs.lightly.ai/train/stable/models/

lightly_train.load_model(model: str | Path, device: Literal['cpu', 'cuda', 'mps'] | device | None = None) TaskModel

Either load model from an exported model file (in .pt format) or a checkpoint file (in .ckpt format) or download it from the Lightly model repository.

First check if model points to a valid file. If not and model is a str try to match that name to one of the models in the Lightly model repository and download it. Downloaded models are cached under the location specified by the environment variable LIGHTLY_TRAIN_MODEL_CACHE_DIR.

Args:
model:

Either a path to the exported model/checkpoint file or the name of a model in the Lightly model repository.

device:

Device to load the model on. If None, the model will be loaded onto a GPU (“cuda” or “mps”) if available, and otherwise fall back to CPU.

Returns:

The loaded model.

lightly_train.train(*, out: str | Path, data: str | Path | Sequence[str | Path], model: str | Module | ModelWrapper | Any, method: str = 'distillation', method_args: dict[str, Any] | None = None, embed_dim: int | None = None, epochs: int | Literal['auto'] = 'auto', batch_size: int = 128, num_workers: int | Literal['auto'] = 'auto', devices: int | str | list[int] = 'auto', num_nodes: int = 1, resume_interrupted: bool = False, checkpoint: str | Path | None = None, overwrite: bool = False, accelerator: str | Accelerator = 'auto', strategy: str | Strategy = 'auto', precision: Literal[64, 32, 16, 'transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true', '64', '32', '16', 'bf16', 'auto'] = 'auto', float32_matmul_precision: Literal['auto', 'highest', 'high', 'medium'] = 'auto', seed: int = 0, loggers: dict[str, dict[str, Any] | None] | None = None, callbacks: dict[str, dict[str, Any] | None] | None = None, optim: str = 'auto', optim_args: dict[str, Any] | None = None, transform_args: dict[str, Any] | None = None, loader_args: dict[str, Any] | None = None, trainer_args: dict[str, Any] | None = None, model_args: dict[str, Any] | None = None, resume: bool | None = None) None

Train a self-supervised model.

See the documentation for more information: https://docs.lightly.ai/train/stable/train.html

The training process can be monitored with TensorBoard:

tensorboard --logdir out

After training, the model is exported in the library default format to out/exported_models/exported_last.pt. It can be exported to different formats using the lightly_train.export command.

Args:
out:

Output directory to save logs, checkpoints, and other artifacts.

data:

Path to a directory containing images or a sequence of image directories and files.

model:

Model name or instance to use for training.

method:

Self-supervised learning method name.

method_args:

Arguments for the self-supervised learning method. The available arguments depend on the method parameter.

embed_dim:

Embedding dimension. Set this if you want to train an embedding model with a specific dimension. If None, the output dimension of model is used.

epochs:

Number of training epochs. Set to “auto” to automatically determine the number of epochs based on the dataset size and batch size.

batch_size:

Global batch size. The batch size per device/GPU is inferred from this value and the number of devices and nodes.

num_workers:

Number of workers for the dataloader per device/GPU. ‘auto’ automatically sets the number of workers based on the available CPU cores.

devices:

Number of devices/GPUs for training. ‘auto’ automatically selects all available devices. The device type is determined by the accelerator parameter.

num_nodes:

Number of nodes for distributed training.

checkpoint:

Use this parameter to further pretrain a model from a previous run. The checkpoint must be a path to a checkpoint file created by a previous training run, for example “out/my_experiment/checkpoints/last.ckpt”. This will only load the model weights from the previous run. All other training state (e.g. optimizer state, epochs) from the previous run are not loaded. Instead, a new run is started with the model weights from the checkpoint.

If you want to resume training from an interrupted or crashed run, use the resume_interrupted parameter instead. See https://docs.lightly.ai/train/stable/train/index.html#resume-training for more information.

resume_interrupted:

Set this to True if you want to resume training from an interrupted or crashed training run. This will pick up exactly where the training left off, including the optimizer state and the current epoch.

  • You must use the same out directory as the interrupted run.

  • You must NOT change any training parameters (e.g., learning rate, batch size, data, etc.).

  • This is intended for continuing the same run without modification.

If you want to further pretrain a model or change the training parameters, use the checkpoint parameter instead. See https://docs.lightly.ai/train/stable/train/index.html#resume-training for more information.

overwrite:

Overwrite the output directory if it already exists. Warning, this might overwrite existing files in the directory!

accelerator:

Hardware accelerator. Can be one of [‘cpu’, ‘gpu’, ‘tpu’, ‘ipu’, ‘hpu’, ‘mps’, ‘auto’]. ‘auto’ will automatically select the best accelerator available.

strategy:

Training strategy. For example ‘ddp’ or ‘auto’. ‘auto’ automatically selects the best strategy available.

precision:

Training precision. Select ‘16-mixed’ for mixed 16-bit precision, ‘32-true’ for full 32-bit precision, or ‘bf16-mixed’ for mixed bfloat16 precision.

float32_matmul_precision:

Precision for float32 matrix multiplication. Can be one of [‘auto’, ‘highest’, ‘high’, ‘medium’]. See https://docs.pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision for more information.

seed:

Random seed for reproducibility.

loggers:

Loggers for training. Either None or a dictionary of logger names to either None or a dictionary of logger arguments. None uses the default loggers. To disable a logger, set it to None: loggers={"tensorboard": None}. To configure a logger, pass the respective arguments: loggers={"wandb": {"project": "my_project"}}.

callbacks:

Callbacks for training. Either None or a dictionary of callback names to either None or a dictionary of callback arguments. None uses the default callbacks. To disable a callback, set it to None: callbacks={"model_checkpoint": None}. To configure a callback, pass the respective arguments: callbacks={"model_checkpoint": {"every_n_epochs": 5}}.

optim:

Optimizer name. Must be one of [‘auto’, ‘adamw’, ‘sgd’]. ‘auto’ automatically selects the optimizer based on the method.

optim_args:

Optimizer arguments. Available arguments depend on the optimizer.

AdamW:

optim_args={"lr": float, "betas": (float, float), "weight_decay": float}

SGD:

optim_args={"lr": float, "momentum": float, "weight_decay": float}

transform_args:

Arguments for the image transform. The available arguments depend on the method parameter. The following arguments are always available:

transform_args={
    "image_size": (int, int),
    "random_resize": {
        "min_scale": float,
        "max_scale": float,
    },
    "random_flip": {
        "horizonal_prob": float,
        "vertical_prob": float,
    },
    "random_rotation": {
        "prob": float,
        "degrees": int,
    },
    "random_gray_scale": float,
    "normalize": {
        "mean": (float, float, float),
        "std": (float, float, float),
    }
}
loader_args:

Arguments for the PyTorch DataLoader. Should only be used in special cases as default values are automatically set. Prefer to use the batch_size and num_workers arguments instead. For details, see: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

trainer_args:

Arguments for the PyTorch Lightning Trainer. Should only be used in special cases as default values are automatically set. For details, see: https://lightning.ai/docs/pytorch/stable/common/trainer.html

model_args:

Arguments for the model. The available arguments depend on the model parameter. For example, if model='torchvision/<model_name>', the arguments are passed to torchvision.models.get_model(model_name, **model_args).

resume:

Deprecated. Use resume_interrupted instead.

lightly_train.train_object_detection(*, out: str | Path, data: dict[str, Any], model: str, steps: int | Literal['auto'] = 'auto', batch_size: int | Literal['auto'] = 'auto', num_workers: int | Literal['auto'] = 'auto', devices: int | str | list[int] = 'auto', num_nodes: int = 1, resume_interrupted: bool = False, checkpoint: str | Path | None = None, reuse_class_head: bool = False, overwrite: bool = False, accelerator: str = 'auto', strategy: str = 'auto', precision: Literal[64, 32, 16, 'transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true', '64', '32', '16', 'bf16'] = 'bf16-mixed', float32_matmul_precision: Literal['auto', 'highest', 'high', 'medium'] = 'auto', seed: int | None = 0, logger_args: dict[str, Any] | None = None, model_args: dict[str, Any] | None = None, transform_args: dict[str, Any] | None = None, loader_args: dict[str, Any] | None = None, save_checkpoint_args: dict[str, Any] | None = None) None

Train an object detection model.

See the documentation for more information: https://docs.lightly.ai/train/stable/object_detection.html

The training process can be monitored with TensorBoard:

tensorboard --logdir out

After training, the last model checkpoint is saved in the out directory to: out/checkpoints/last.ckpt and also exported to out/exported_models/exported_last.pt.

Args:
out:

The output directory where the model checkpoints and logs are saved.

data:

The dataset configuration. See the documentation for more information: https://docs.lightly.ai/train/stable/object_detection.html#data

model:

The model to train. For example, “dinov3/convnext-tiny-ltdetr-coco”, “dinov2/vits14-ltdetr”, or a path to a local model checkpoint.

If you want to resume training from an interrupted or crashed run, use the resume_interrupted parameter.

steps:

The number of training steps.

batch_size:

Global batch size. The batch size per device/GPU is inferred from this value and the number of devices and nodes.

num_workers:

Number of workers for the dataloader per device/GPU. ‘auto’ automatically sets the number of workers based on the available CPU cores.

devices:

Number of devices/GPUs for training. ‘auto’ automatically selects all available devices. The device type is determined by the accelerator parameter.

num_nodes:

Number of nodes for distributed training.

checkpoint:

Use this parameter to further fine-tune a model from a previous fine-tuned checkpoint. The checkpoint must be a path to a checkpoint file, for example “checkpoints/model.ckpt”. This will only load the model weights from the previous run. All other training state (e.g. optimizer state, epochs) from the previous run are not loaded.

This option is equivalent to setting model="<path_to_checkpoint>".

If you want to resume training from an interrupted or crashed run, use the resume_interrupted parameter instead.

reuse_class_head:

Set this to True if you want to keep the class head from the provided checkpoint. The default behavior removes the class head before loading so that a new head can be initialized for the current task.

resume_interrupted:

Set this to True if you want to resume training from an interrupted or crashed training run. This will pick up exactly where the training left off, including the optimizer state and the current step.

  • You must use the same out directory as the interrupted run.

  • You must NOT change any training parameters (e.g., learning rate, batch size, data, etc.).

  • This is intended for continuing the same run without modification.

overwrite:

Overwrite the output directory if it already exists. Warning, this might overwrite existing files in the directory!

accelerator:

Hardware accelerator. Can be one of [‘cpu’, ‘gpu’, ‘mps’, ‘auto’]. ‘auto’ will automatically select the best accelerator available.

strategy:

Training strategy. For example ‘ddp’ or ‘auto’. ‘auto’ automatically selects the best strategy available.

precision:

Training precision. Select ‘16-mixed’ for mixed 16-bit precision, ‘32-true’ for full 32-bit precision, or ‘bf16-mixed’ for mixed bfloat16 precision.

float32_matmul_precision:

Precision for float32 matrix multiplication. Can be one of [‘auto’, ‘highest’, ‘high’, ‘medium’]. See https://docs.pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision for more information.

seed:

Random seed for reproducibility.

logger_args:

Logger arguments. Either None or a dictionary of logger names to either None or a dictionary of logger arguments. None uses the default loggers. To disable a logger, set it to None: logger_args={"tensorboard": None}. To configure a logger, pass the respective arguments: logger_args={"mlflow": {"experiment_name": "my_experiment", ...}}. See https://docs.lightly.ai/train/stable/semantic_segmentation.html#logging for more information.

model_args:

Model training arguments. Either None or a dictionary of model arguments.

transform_args:

Transform arguments. Either None or a dictionary of transform arguments. The image size and normalization parameters can be set with transform_args={"image_size": (height, width), "normalize": {"mean": (r, g, b), "std": (r, g, b)}}

loader_args:

Arguments for the PyTorch DataLoader. Should only be used in special cases as default values are automatically set. Prefer to use the batch_size and num_workers arguments instead. For details, see: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

save_checkpoint_args:

Arguments to configure the saving of checkpoints. The checkpoint frequency can be set with save_checkpoint_args={"save_every_num_steps": 100}.

lightly_train.train_semantic_segmentation(*, out: str | Path, data: dict[str, Any], model: str, steps: int | Literal['auto'] = 'auto', batch_size: int | Literal['auto'] = 'auto', num_workers: int | Literal['auto'] = 'auto', devices: int | str | list[int] = 'auto', num_nodes: int = 1, resume_interrupted: bool = False, checkpoint: str | Path | None = None, reuse_class_head: bool = False, overwrite: bool = False, accelerator: str = 'auto', strategy: str = 'auto', precision: Literal[64, 32, 16, 'transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true', '64', '32', '16', 'bf16'] = 'bf16-mixed', float32_matmul_precision: Literal['auto', 'highest', 'high', 'medium'] = 'auto', seed: int | None = 0, logger_args: dict[str, Any] | None = None, model_args: dict[str, Any] | None = None, transform_args: dict[str, Any] | None = None, loader_args: dict[str, Any] | None = None, save_checkpoint_args: dict[str, Any] | None = None) None

Train a semantic segmentation model.

See the documentation for more information: https://docs.lightly.ai/train/stable/semantic_segmentation.html

The training process can be monitored with TensorBoard:

tensorboard --logdir out

After training, the last model checkpoint is saved in the out directory to: out/checkpoints/last.ckpt and also exported to out/exported_models/exported_last.pt.

Args:
out:

The output directory where the model checkpoints and logs are saved.

data:

The dataset configuration. See the documentation for more information: https://docs.lightly.ai/train/stable/semantic_segmentation.html#data

model:

The model to train. For example, “dinov2/vits14-eomt”, “dinov3/vits16-eomt-coco”, or a path to a local model checkpoint.

If you want to resume training from an interrupted or crashed run, use the resume_interrupted parameter.

steps:

The number of training steps.

batch_size:

Global batch size. The batch size per device/GPU is inferred from this value and the number of devices and nodes.

num_workers:

Number of workers for the dataloader per device/GPU. ‘auto’ automatically sets the number of workers based on the available CPU cores.

devices:

Number of devices/GPUs for training. ‘auto’ automatically selects all available devices. The device type is determined by the accelerator parameter.

num_nodes:

Number of nodes for distributed training.

checkpoint:

Use this parameter to further fine-tune a model from a previous fine-tuned checkpoint. The checkpoint must be a path to a checkpoint file, for example “checkpoints/model.ckpt”. This will only load the model weights from the previous run. All other training state (e.g. optimizer state, epochs) from the previous run are not loaded.

This option is equivalent to setting model="<path_to_checkpoint>".

If you want to resume training from an interrupted or crashed run, use the resume_interrupted parameter instead.

reuse_class_head:

Set this to True if you want to keep the class head from the provided checkpoint. The default behavior removes the class head before loading so that a new head can be initialized for the current task.

resume_interrupted:

Set this to True if you want to resume training from an interrupted or crashed training run. This will pick up exactly where the training left off, including the optimizer state and the current step.

  • You must use the same out directory as the interrupted run.

  • You must NOT change any training parameters (e.g., learning rate, batch size, data, etc.).

  • This is intended for continuing the same run without modification.

overwrite:

Overwrite the output directory if it already exists. Warning, this might overwrite existing files in the directory!

accelerator:

Hardware accelerator. Can be one of [‘cpu’, ‘gpu’, ‘mps’, ‘auto’]. ‘auto’ will automatically select the best accelerator available.

strategy:

Training strategy. For example ‘ddp’ or ‘auto’. ‘auto’ automatically selects the best strategy available.

precision:

Training precision. Select ‘16-mixed’ for mixed 16-bit precision, ‘32-true’ for full 32-bit precision, or ‘bf16-mixed’ for mixed bfloat16 precision.

float32_matmul_precision:

Precision for float32 matrix multiplication. Can be one of [‘auto’, ‘highest’, ‘high’, ‘medium’]. See https://docs.pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision for more information.

seed:

Random seed for reproducibility.

logger_args:

Logger arguments. Either None or a dictionary of logger names to either None or a dictionary of logger arguments. None uses the default loggers. To disable a logger, set it to None: logger_args={"tensorboard": None}. To configure a logger, pass the respective arguments: logger_args={"mlflow": {"experiment_name": "my_experiment", ...}}. See https://docs.lightly.ai/train/stable/semantic_segmentation.html#logging for more information.

model_args:

Model training arguments. Either None or a dictionary of model arguments.

transform_args:

Transform arguments. Either None or a dictionary of transform arguments. The image size and normalization parameters can be set with transform_args={"image_size": (height, width), "normalize": {"mean": (r, g, b), "std": (r, g, b)}}

loader_args:

Arguments for the PyTorch DataLoader. Should only be used in special cases as default values are automatically set. Prefer to use the batch_size and num_workers arguments instead. For details, see: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

save_checkpoint_args:

Arguments to configure the saving of checkpoints. The checkpoint frequency can be set with save_checkpoint_args={"save_every_num_steps": 100}.

Models

class lightly_train._task_models.dinov2_eomt_semantic_segmentation.task_model.DINOv2EoMTSemanticSegmentation
predict(image: str | Path | Image | Tensor) Tensor

Returns the predicted mask for the given image.

Args:
image:

The input image as a path, URL, PIL image, or tensor. Tensors must have shape (C, H, W).

Returns:

The predicted mask as a tensor of shape (H, W). The values represent the class IDs as defined in the classes argument of your dataset. These classes are also stored in the classes attribute of the model. The model will always predict the pixels as one of the known classes even when your dataset contains ignored classes defined by the ignore_classes argument.

class lightly_train._task_models.dinov3_eomt_semantic_segmentation.task_model.DINOv3EoMTSemanticSegmentation
predict(image: str | Path | Image | Tensor) Tensor

Returns the predicted mask for the given image.

Args:
image:

The input image as a path, URL, PIL image, or tensor. Tensors must have shape (C, H, W).

Returns:

The predicted mask as a tensor of shape (H, W). The values represent the class IDs as defined in the classes argument of your dataset. These classes are also stored in the classes attribute of the model. The model will always predict the pixels as one of the known classes even when your dataset contains ignored classes defined by the ignore_classes argument.

class lightly_train._task_models.dinov2_ltdetr_object_detection.task_model.DINOv2LTDETRObjectDetection
predict(image: str | Path | Image | Tensor, threshold: float = 0.6) dict[str, Tensor]

Returns predictions for the given image.

Args:
image:

The input image as a path, URL, PIL image, or tensor. Tensors must have shape (C, H, W).

class lightly_train._task_models.dinov3_ltdetr_object_detection.task_model.DINOv3LTDETRObjectDetection
predict(image: str | Path | Image | Tensor, threshold: float = 0.6) dict[str, Tensor]

Returns predictions for the given image.

Args:
image:

The input image as a path, URL, PIL image, or tensor. Tensors must have shape (C, H, W).