lightly.cli

The lightly.cli module provides a console interface for training self-supervised models, embedding, and filtering datasets

.lightly_cli

Lightly Magic: Train and embed in one command.

This module contains the entrypoint for the lightly-magic command-line interface.

lightly.cli.lightly_cli.lightly_cli(cfg)

Train a self-supervised model and use it to embed your dataset.

Parameters: cfg – The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:

input_dir:: Path to the input directory where images are stored.

Examples

>>> # train model and embed images with default settings
>>> lightly-magic input_dir=data/
>>>
>>> # train model for 10 epochs and embed images
>>> lightly-magic input_dir=data/ trainer.max_epochs=10

.train_cli

Lightly SSL Train: Train a self-supervised model from the command-line.

This module contains the entrypoint for the lightly-ssl-train command-line interface.

lightly.cli.train_cli.train_cli(cfg)

Train a self-supervised model from the command-line.

Parameters: cfg – The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:

input_dir:: Path to the input directory where images are stored.

Examples

>>> # train model with default settings
>>> lightly-ssl-train input_dir=data/
>>>
>>> # train model with batches of size 128
>>> lightly-ssl-train input_dir=data/ loader.batch_size=128
>>>
>>> # train model for 10 epochs
>>> lightly-ssl-train input_dir=data/ trainer.max_epochs=10
>>>
>>> # print a full summary of the model
>>> lightly-ssl-train input_dir=data/ trainer.weights_summary=full

.embed_cli

Lightly Embed: Embed images with one command.

This module contains the entrypoint for the lightly-embed command-line interface.

lightly.cli.embed_cli.embed_cli(cfg) → str

Embed images from the command-line.

Parameters: cfg – The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:

input_dir:: Path to the input directory where images are stored.
checkpoint:: Path to the checkpoint of a pretrained model. If left empty, a pretrained model by lightly is used.

Returns: The path to the created embeddings file.

Examples

>>> # embed images with default settings and a lightly model
>>> lightly-embed input_dir=data/
>>>
>>> # embed images with default settings and a custom checkpoint
>>> lightly-embed input_dir=data/ checkpoint=my_checkpoint.ckpt
>>>
>>> # embed images with custom settings
>>> lightly-embed input_dir=data/ model.num_ftrs=32

.download_cli

Lightly Download: Download images from the Lightly platform.

This module contains the entrypoint for the lightly-download command-line interface.

lightly.cli.download_cli.download_cli(cfg)

Download images from the Lightly platform.

Parameters: cfg – The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:

tag_name:: Download all images from the requested tag. Use initial-tag to get all images from the dataset.
token:: User access token to the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform.
dataset_id:: Identifier of the dataset on the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform.
input_dir:: If input_dir and output_dir are specified, lightly will copy all images belonging to the tag from the input_dir to the output_dir.
output_dir:: If input_dir and output_dir are specified, lightly will copy all images belonging to the tag from the input_dir to the output_dir.

Examples

>>> # download list of all files in the dataset from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ'
>>>
>>> # download list of all files in tag 'my-tag' from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag'
>>>
>>> # download all images in tag 'my-tag' from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag' output_dir='my_data/'
>>>
>>> # copy all files in 'my-tag' to a new directory
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag' input_dir='data/' output_dir='my_data/'

.version_cli

Lightly Version: Show the version of the installed package.

Example

>>> # show the version of the installed package
>>> lightly-version

.crop_cli

Lightly Train: Train a self-supervised model from the command-line.

This module contains the entrypoint for the lightly-crop command-line interface.

lightly.cli.crop_cli.crop_cli(cfg)

Crops images into one sub-image for each object.

Parameters: cfg – The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:

input_dir:: Path to the input directory where images are stored.
labels_dir:: Path to the directory where the labels are stored. There must be one label file for each image. The label file must have the same name as the image file, but the extension .txt. For example, img_123.txt for img_123.jpg. The label file must be in YOLO format.
output_dir:: Path to the directory where the cropped images are stored. They are stored in one directory per input image.
crop_padding: Optional: The additonal padding about the bounding box. This makes the crops include the context of the object. The padding is relative and added to the width and height.
label_names_file: Optional: A yaml file including the names of the classes. If it is given, the filenames of the cropped images include the class names instead of the class id. This file is usually included when having a dataset in yolo format. Example contents of such a label_names_file.yaml: “names: [‘class_name_a’, ‘class_name_b’]”

Examples

>>> # Crop images and set the crop to be 20% around the bounding box
>>> lightly-crop input_dir=data/images label_dir=data/labels output_dir=data/cropped_images crop_padding=0.2

>>> # Crop images and use the class names in the filename
>>> lightly-crop input_dir=data/images label_dir=data/labels output_dir=data/cropped_images label_names_file=data/data.yaml

.config.config.yaml

The default settings for all command line tools in the LightlySSL Python package are stored in a YAML config file. The config file is distributed along with the Python package and can be adapted to fit custom requirements.

The arguments are grouped into namespaces. For example, everything related to the embedding model is grouped under the namespace “model”. See the config file listed below for an overview over the different namespaces.

Overwrites

The default settings can (and sometimes must) be overwritten. For example, when using any command-line tool, it is necessary to specify an input directory where images are stored. The default setting of “input_dir” is and empty string so it must be overwritten:

# train the default model on my data
lightly-ssl-train input_dir='path/to/my/data'

An argument which is grouped under a certain namespace can be accessed by specifying the namespace and the argument, separated by a dot. For example the argument “name” in the namespace “model” can be accessed like so:

# train a ResNet-34 on my data
lightly-ssl-train input_dir='path/to/my/data' model.name='resnet-34'

Additional Arguments

Some of the grouped arguments are passed directly to the constructor of the respective class. For example, all arguments under the namespace “optimizer” are passed directly to the PyTorch constructor of the optimizer. If you take a look at the default settings below, you can see that the momentum of the optimizer is not specified in the config file. In order to train a self-supervised model with momentum, an additional argument needs to be passed. This can be done by adding a + right before the argument:

# train a ResNet-34 with momentum on my data
lightly-ssl-train input_dir='path/to/my/data' model.name='resnet-34' +optimizer.momentum=0.9

Default Settings

### i/o
# The following arguments specify input and output locations
# of images, embeddings, and checkpoints.
input_dir: ''                 # Path to input directory which holds images.
output_dir: ''                # Path to directory which should store downloads.
embeddings: ''                # Path to csv file which holds embeddings.
checkpoint: ''                # Path to a model checkpoint. If left empty, a pre-trained model
                              # will be used.
label_dir: ''                 # Path to the input directory which holds the labels.
label_names_file: ''          # Path to a yaml file having the label names under the value 'names'
custom_metadata: ''           # Path to a json file in COCO format containing additional metadata

### Lightly platform
# The following arguments are required for requests to the
# Lightly platform.
token: ''                     # User access token to the Lightly platform.
dataset_id: ''                # Identifier of the dataset on the Lightly platform.
new_dataset_name: ''          # Name of the new dataset to be created on the Lightly platform
upload: 'full'                # Whether to upload full images, thumbnails only, or metadata only.
                              # Must be one of ['full', 'thumbnails', 'none']
append: False                 # Must be True if you want to append samples to an existing dataset.
resize: -1                    # Allow resizing of the images before uploading, usage =-1, =x, =[x,y]
embedding_name: 'default'     # Name of the embedding to be used on the Lightly platform.
emb_upload_bsz: 32            # Number of embeddings which are uploaded in a single batch.
tag_name: 'initial-tag'       # Name of the requested tag on the Lightly platform.
exclude_parent_tag: False     # If true, only the samples in the defined tag, but without the parent tag, are taken.

### training and embeddings
pre_trained: True             # Whether to use a pre-trained model or not
crop_padding: 0.1             # The padding to use when cropping

# model namespace: Passed to lightly.models.ResNetGenerator.
model:
  name: 'resnet-18'           # Name of the model, currently supports popular variants:
                              # resnet-18, resnet-34, resnet-50, resnet-101, resnet-152.
  out_dim: 128                # Dimensionality of output on which self-supervised loss is calculated.
  num_ftrs: 32                # Dimensionality of feature vectors (embedding size).
  width: 1                    # Width of the resnet.

# criterion namespace: Passed to lightly.loss.NTXentLoss.
criterion:            
  temperature: 0.5            # Number by which logits are divided.
  memory_bank_size: 0         # Size of the memory bank to use (e.g. for MoCo). 0 means no memory bank.
                              # ^ slight abuse of notation, MoCo paper calls it momentum encoder

# optimizer namespace: Passed to torch.optim.SGD.
optimizer:
  lr: 1.                      # Learning rate of the optimizer.
  weight_decay: 0.00001       # L2 penalty.

# collate namespace: Passed to lightly.data.ImageCollateFunction.
collate:
  input_size: 64              # Size of the input images in pixels.
  cj_prob: 0.8                # Probability that color jitter is applied.
  cj_bright: 0.7              # Color_jitter intensity for brightness,
  cj_contrast: 0.7            # contrast,
  cj_sat: 0.7                 # saturation,
  cj_hue: 0.2                 # and hue.
  min_scale: 0.15             # Minimum size of random crop relative to input_size.
  random_gray_scale: 0.2      # Probability of converting image to gray scale.
  gaussian_blur: 0.5          # Probability of Gaussian blur.
  sigmas: [0.2, 2]            # Sigmas of Gaussian blur 
  kernel_size: null           # Will be deprecated in favor of `sigmas` argument. If set, the old behavior
                              # applies and `sigmas` is ignored. Used to calculate sigma of gaussian blur 
                              # with kernel_size * input_size.
  vf_prob: 0.0                # Probability that vertical flip is applied.
  hf_prob: 0.5                # Probability that horizontal flip is applied.
  rr_prob: 0.0                # Probability that random rotation is applied.
  rr_degrees: null            # Range of degrees to select from for random rotation.
                              # If rr_degrees is null, images are rotated by 90 degrees.
                              # If rr_degrees is a [min, max] list, images are rotated
                              # by a random angle in [min, max]. If rr_degrees is a
                              # single number, images are rotated by a random angle in
                              # [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.

# loader namespace: Passed to torch.utils.data.DataLoader.
loader:
  batch_size: 16              # Batch size for training / inference.
  shuffle: True               # Whether to reshuffle data each epoch.
  num_workers: -1             # Number of workers pre-fetching batches.
                              # -1 == number of available cores,
                              # if -1, minimum of 8, maximum of 32 workers for upload.
  drop_last: True             # Whether to drop the last batch during training.

# trainer namespace: Passed to pytorch_lightning.Trainer.
trainer:
  gpus: 1                     # Number of gpus to use for training.
  max_epochs: 100             # Number of epochs to train for.
  precision: 32               # If set to 16, will use half-precision.
  enable_model_summary: True  # Whether to enable model summarisation.
  weights_summary:            # [deprecated] Use enable_model_summary
                              # and summary_callback.max_depth.

# checkpoint_callback namespace: Modify the checkpoint callback
checkpoint_callback:
  save_last: True             # Whether to save the checkpoint from the last epoch.
  save_top_k: 1               # Save the top k checkpoints.
  dirpath:                    # Where to store the checkpoints (empty field resolves to None).
                              # If not set, checkpoints are stored in the hydra output dir.

# summary_callback namespace: Modify the summary callback
summary_callback:
  max_depth: 1                # The maximum depth of layer nesting that the summary will include.

# environment variable namespace for saving artifacts
environment_variable_names:
  lightly_last_checkpoint_path: LIGHTLY_LAST_CHECKPOINT_PATH
  lightly_last_embedding_path: LIGHTLY_LAST_EMBEDDING_PATH
  lightly_last_dataset_id: LIGHTLY_LAST_DATASET_ID

# seed
seed: 1

### hydra
# The arguments below are built-ins from the hydra-core Python package.
hydra:
  run:
    dir: lightly_outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
  help:
    header: |
      == Description ==
      The lightly Python package is a command-line tool for self-supervised learning.

    footer: |
      == Examples ==

      Use a pre-trained resnet-18 to embed your images
      > lightly-embed input='path/to/image/folder' collate.input_size=224

      Load a model from a custom checkpoint to embed your images
      > lightly-embed input_dir='path/to/image/folder' collate.input_size=224 checkpoint='path/to/checkpoint.ckpt'

      Train a self-supervised model on your image dataset from scratch
      > lightly-ssl-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224 pre_trained=False

      Train a self-supervised model starting from the pre-trained checkpoint
      > lightly-ssl-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224

      Train a self-supervised model starting from a custom checkpoint
      > lightly-ssl-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224 checkpoint='path/to/checkpoint.ckpt'

      Train using half-precision
      > lightly-ssl-train input_dir='path/to/image/folder' trainer.precision=16
      
      Download a list of files in a given tag from the Lightly Platform
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token'

      Download a list of files in a given tag without filenames from the parent tag from the Lightly Platform
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token' exclude_parent_tag=True

      Copy all files in a given tag from a source directory to a target directory
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token' input_dir='data/' output_dir='new_data/'

      == Additional Information ==

      Use self-supervised methods to understand and filter raw image data:

      Website: https://www.lightly.ai
      Documentation: https://docs.lightly.ai/self-supervised-learning/getting_started/command_line_tool.html