lightly.cli

The lightly.cli module provides a console interface for training self-supervised models, embedding, and filtering datasets

.lightly_cli

Lightly Magic: Train, embed, and upload in one command.

This module contains the entrypoint for the lightly-magic command-line interface.

lightly.cli.lightly_cli.lightly_cli(cfg)

Train a self-supervised model and use it to embed your dataset.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
input_dir:

Path to the input directory where images are stored.

token:

User access token to the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform. (Required for upload)

dataset_id:

Identifier of the dataset on the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform. (Required for upload)

custom_metadata:

Path to a .json file containing custom metadata. The file must be in the COCO annotations (although annotations can be empty) format and contain an additional field metadata storing a list of metadata entries. The metadata entries are matched with the images via image_id.

Examples:
>>> # train model and embed images with default settings
>>> lightly-magic input_dir=data/
>>>
>>> # train model for 10 epochs and embed images
>>> lightly-magic input_dir=data/ trainer.max_epochs=10
>>>
>>> # train model, embed images, and upload to the Lightly platform
>>> lightly-magic input_dir=data/ token='123' dataset_id='XYZ'
>>>
>>> # upload images, embeddings, and custom metadata
>>> lightly-magic input_dir=data/ token='123' dataset_id='XYZ' custom_metadata=custom_metadata.json

.train_cli

Lightly Train: Train a self-supervised model from the command-line.

This module contains the entrypoint for the lightly-train command-line interface.

lightly.cli.train_cli.train_cli(cfg)

Train a self-supervised model from the command-line.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
input_dir:

Path to the input directory where images are stored.

Examples:
>>> # train model with default settings
>>> lightly-train input_dir=data/
>>>
>>> # train model with batches of size 128
>>> lightly-train input_dir=data/ loader.batch_size=128
>>>
>>> # train model for 10 epochs
>>> lightly-train input_dir=data/ trainer.max_epochs=10
>>>
>>> # print a full summary of the model
>>> lightly-train input_dir=data/ trainer.weights_summary=full

.embed_cli

Lightly Embed: Embed images with one command.

This module contains the entrypoint for the lightly-embed command-line interface.

lightly.cli.embed_cli.embed_cli(cfg)

Embed images from the command-line.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
input_dir:

Path to the input directory where images are stored.

checkpoint:

Path to the checkpoint of a pretrained model. If left empty, a pretrained model by lightly is used.

Examples:
>>> # embed images with default settings and a lightly model
>>> lightly-embed input_dir=data/
>>>
>>> # embed images with default settings and a custom checkpoint
>>> lightly-embed input_dir=data/ checkpoint=my_checkpoint.ckpt
>>>
>>> # embed images with custom settings
>>> lightly-embed input_dir=data/ model.num_ftrs=32

.upload_cli

Lightly Upload: Upload images to the Lightly platform.

This module contains the entrypoint for the lightly-upload command-line interface.

lightly.cli.upload_cli.upload_cli(cfg)

Upload images/embeddings from the command-line to the Lightly platform.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
input_dir:

Path to the input directory where images are stored.

embeddings:

Path to the csv file storing the embeddings generated by lightly.

token:

User access token to the Lightly platform. If needs to be specified to upload the images and embeddings to the platform.

dataset_id:

Identifier of the dataset on the Lightly platform. Either the dataset_id or the new_dataset_name need to be specified.

new_dataset_name:

The name of the new dataset to create on the Lightly platform. Either the dataset_id or the new_dataset_name need to be specified.

upload:

String to determine whether to upload the full images, thumbnails only, or metadata only.

Must be one of [‘full’, ‘thumbnails’, ‘metadata’]

embedding_name:

Assign the embedding a name in order to identify it on the Lightly platform.

resize:

Desired size of the uploaded images. If negative, default size is used. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).

custom_metadata:

Path to a .json file containing custom metadata. The file must be in the COCO annotations (although annotations can be empty) format and contain an additional field metadata storing a list of metadata entries. The metadata entries are matched with the images via image_id.

Examples:
>>> # create a new dataset on the Lightly platform and upload thumbnails to it
>>> lightly-upload input_dir=data/ token='123' new_dataset_name='new_dataset_name_xyz'
>>>
>>> # upload thumbnails to the Lightly platform to an existing dataset
>>> lightly-upload input_dir=data/ token='123' dataset_id='XYZ'
>>>
>>> # create a new dataset on the Lightly platform and upload full images to it
>>> lightly-upload input_dir=data/ token='123' new_dataset_name='new_dataset_name_xyz' upload='full'
>>>
>>> # upload metadata to the Lightly platform
>>> lightly-upload input_dir=data/ token='123' dataset_id='XYZ' upload='metadata'
>>>
>>> # upload embeddings to the Lightly platform (must have uploaded images beforehand)
>>> lightly-upload embeddings=embeddings.csv token='123' dataset_id='XYZ'
>>>
>>> # upload both, images and embeddings in a single command
>>> lightly-upload input_dir=data/ embeddings=embeddings.csv upload='full' \
>>>     token='123' dataset_id='XYZ'
>>>
>>> # create a new dataset on the Lightly platform and upload both, images and embeddings
>>> lightly-upload input_dir=data/ embeddings=embeddings.csv upload='full' \
>>>     token='123' new_dataset_name='new_dataset_name_xyz'
>>>
>>> # upload a dataset with custom metadata
>>> lightly-upload input_dir=data/ token='123' dataset_id='XYZ' custom_metadata=custom_metadata.json
>>>
>>> # upload custom metadata to an existing dataset
>>> lightly-upload token='123' dataset_id='XYZ' custom_metadata=custom_metadata.json

.download_cli

Lightly Download: Download images from the Lightly platform.

This module contains the entrypoint for the lightly-download command-line interface.

lightly.cli.download_cli.download_cli(cfg)

Download images from the Lightly platform.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
tag_name:

Download all images from the requested tag. Use initial-tag to get all images from the dataset.

token:

User access token to the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform.

dataset_id:

Identifier of the dataset on the Lightly platform. If dataset_id and token are specified, the images and embeddings are uploaded to the platform.

input_dir:

If input_dir and output_dir are specified, lightly will copy all images belonging to the tag from the input_dir to the output_dir.

output_dir:

If input_dir and output_dir are specified, lightly will copy all images belonging to the tag from the input_dir to the output_dir.

Examples:
>>> # download list of all files in the dataset from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ'
>>> 
>>> # download list of all files in tag 'my-tag' from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag'
>>>
>>> # download all images in tag 'my-tag' from the Lightly platform
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag' output_dir='my_data/'
>>>
>>> # copy all files in 'my-tag' to a new directory
>>> lightly-download token='123' dataset_id='XYZ' tag_name='my-tag' input_dir='data/' output_dir='my_data/'

.version_cli

Lightly Version: Show the version of the installed package.

Example:
>>> # show the version of the installed package
>>> lightly-version

.crop_cli

Lightly Train: Train a self-supervised model from the command-line.

This module contains the entrypoint for the lightly-train command-line interface.

lightly.cli.crop_cli.crop_cli(cfg)

Crops images into one sub-image for each object.

Args:
cfg:

The default configs are loaded from the config file. To overwrite them please see the section on the config file (.config.config.yaml).

Command-Line Args:
input_dir:

Path to the input directory where images are stored.

labels_dir:

Path to the directory where the labels are stored. There must be one label file for each image. The label file must have the same name as the image file, but the extension .txt. For example, img_123.txt for img_123.jpg. The label file must be in YOLO format.

output_dir:

Path to the directory where the cropped images are stored. They are stored in one directory per input image.

crop_padding: Optional

The additonal padding about the bounding box. This makes the crops include the context of the object. The padding is relative and added to the width and height.

label_names_file: Optional

A yaml file including the names of the classes. If it is given, the filenames of the cropped images include the class names instead of the class id. This file is usually included when having a dataset in yolo format. Example contents of such a label_names_file.yaml: “names: [‘class_name_a’, ‘class_name_b’]”

Examples:
>>> # Crop images and set the crop to be 20% around the bounding box
>>> lightly-crop input_dir=data/images label_dir=data/labels output_dir=data/cropped_images crop_padding=0.2
>>> # Crop images and use the class names in the filename
>>> lightly-crop input_dir=data/images label_dir=data/labels output_dir=data/cropped_images label_names_file=data/data.yaml

.config.config.yaml

The default settings for all command line tools in the lightly Python package are stored in a YAML config file. The config file is distributed along with the Python package and can be adapted to fit custom requirements.

The arguments are grouped into namespaces. For example, everything related to the embedding model is grouped under the namespace “model”. See the config file listed below for an overview over the different namespaces.

Overwrites

The default settings can (and sometimes must) be overwritten. For example, when using any command-line tool, it is necessary to specify an input directory where images are stored. The default setting of “input_dir” is and empty string so it must be overwritten:

# train the default model on my data
lightly-train input_dir='path/to/my/data'

An argument which is grouped under a certain namespace can be accessed by specifying the namespace and the argument, separated by a dot. For example the argument “name” in the namespace “model” can be accessed like so:

# train a ResNet-34 on my data
lightly-train input_dir='path/to/my/data' model.name='resnet-34'

Additional Arguments

Some of the grouped arguments are passed directly to the constructor of the respective class. For example, all arguments under the namespace “optimizer” are passed directly to the PyTorch constructor of the optimizer. If you take a look at the default settings below, you can see that the momentum of the optimizer is not specified in the config file. In order to train a self-supervised model with momentum, an additional argument needs to be passed. This can be done by adding a + right before the argument:

# train a ResNet-34 with momentum on my data
lightly-train input_dir='path/to/my/data' model.name='resnet-34' +optimizer.momentum=0.9

Default Settings

### i/o
# The following arguments specify input and output locations
# of images, embeddings, and checkpoints.
input_dir: ''                 # Path to input directory which holds images.
output_dir: ''                # Path to directory which should store downloads.
embeddings: ''                # Path to csv file which holds embeddings.
checkpoint: ''                # Path to a model checkpoint. If left empty, a pre-trained model
                              # will be used.
label_dir: ''                 # Path to the input directory which holds the labels.
label_names_file: ''          # Path to a yaml file having the label names under the value 'names'
custom_metadata: ''           # Path to a json file in COCO format containing additional metadata

### Lightly platform
# The following arguments are required for requests to the
# Lightly platform.
token: ''                     # User access token to the Lightly platform.
dataset_id: ''                # Identifier of the dataset on the Lightly platform.
new_dataset_name: ''          # Name of the new dataset to be created on the Lightly platform
upload: 'full'                # Whether to upload full images, thumbnails only, or metadata only.
                              # Must be one of ['full', 'thumbnails', 'none']
resize: -1                    # Allow resizing of the images before uploading, usage =-1, =x, =[x,y]
embedding_name: 'default'     # Name of the embedding to be used on the Lightly platform.
emb_upload_bsz: 32            # Number of embeddings which are uploaded in a single batch.
tag_name: 'initial-tag'       # Name of the requested tag on the Lightly platform.
exclude_parent_tag: False     # If true, only the samples in the defined tag, but without the parent tag, are taken.

### training and embeddings
pre_trained: True             # Whether to use a pre-trained model or not
crop_padding: 0.1             # The padding to use when cropping

# model namespace: Passed to lightly.models.ResNetGenerator.
model:
  name: 'resnet-18'           # Name of the model, currently supports popular variants:
                              # resnet-18, resnet-34, resnet-50, resnet-101, resnet-152.
  out_dim: 128                # Dimensionality of output on which self-supervised loss is calculated.
  num_ftrs: 32                # Dimensionality of feature vectors (embedding size).
  width: 1                    # Width of the resnet.

# criterion namespace: Passed to lightly.loss.NTXentLoss.
criterion:            
  temperature: 0.5            # Number by which logits are divided.
  memory_bank_size: 0         # Size of the memory bank to use (e.g. for MoCo). 0 means no memory bank.
                              # ^ slight abuse of notation, MoCo paper calls it momentum encoder

# optimizer namespace: Passed to torch.optim.SGD.
optimizer:
  lr: 1.                      # Learning rate of the optimizer.
  weight_decay: 0.00001       # L2 penalty.

# collate namespace: Passed to lightly.data.ImageCollateFunction.
collate:
  input_size: 64              # Size of the input images in pixels.
  cj_prob: 0.8                # Probability that color jitter is applied.
  cj_bright: 0.7              # Color_jitter intensity for brightness,
  cj_contrast: 0.7            # contrast,
  cj_sat: 0.7                 # saturation,
  cj_hue: 0.2                 # and hue.
  min_scale: 0.15             # Minimum size of random crop relative to input_size.
  random_gray_scale: 0.2      # Probability of converting image to gray scale.
  gaussian_blur: 0.5          # Probability of Gaussian blur.
  kernel_size: 0.1            # Kernel size of gaussian blur relative to input_size.
  vf_prob: 0.0                # Probability that vertical flip is applied.
  hf_prob: 0.5                # Probability that horizontal flip is applied.
  rr_prob: 0.0                # Probability that random (+-90 degree) rotation is applied.

# loader namespace: Passed to torch.utils.data.DataLoader.
loader:
  batch_size: 16              # Batch size for training / inference.
  shuffle: True               # Whether to reshuffle data each epoch.
  num_workers: -1             # Number of workers pre-fetching batches (-1 == number of available cores).
  drop_last: True             # Wether to drop the last batch during training.

# trainer namespace: Passed to pytorch_lightning.Trainer.
trainer:
  gpus: 1                     # Number of gpus to use for training.
  max_epochs: 100             # Number of epochs to train for.
  precision: 32               # If set to 16, will use half-precision.
  weights_summary: 'top'      # how to print the model architecture, one of {None, top, full},
                                #see https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#weights-summary

# checkpoint_callback namespace: Modify the checkpoint callback
checkpoint_callback:
  save_last: True             # Whether to save the checkpoint from the last epoch.
  save_top_k: 1               # Save the top k checkpoints.
  dirpath:                    # Where to store the checkpoints (empty field resolves to None).
                              # If not set, checkpoints are stored in the hydra output dir.

# seed
seed: 1

### hydra
# The arguments below are built-ins from the hydra-core Python package.
hydra:
  run:
    dir: lightly_outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
  help:
    header: |
      == Description ==
      The lightly Python package is a command-line tool for self-supervised learning.

    footer: |
      == Examples ==

      Use a pre-trained resnet-18 to embed your images
      > lightly-embed input='path/to/image/folder' collate.input_size=224

      Load a model from a custom checkpoint to embed your images
      > lightly-embed input_dir='path/to/image/folder' collate.input_size=224 checkpoint='path/to/checkpoint.ckpt'

      Train a self-supervised model on your image dataset from scratch
      > lightly-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224 pre_trained=False

      Train a self-supervised model starting from the pre-trained checkpoint
      > lightly-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224

      Train a self-supervised model starting from a custom checkpoint
      > lightly-train input_dir='path/to/image/folder' loader.batch_size=128 collate.input_size=224 checkpoint='path/to/checkpoint.ckpt'

      Train using half-precision
      > lightly-train input_dir='path/to/image/folder' trainer.precision=16

      Upload thumbnails to the Lightly web solution
      > lightly-upload input_dir='path/to/image/folder' dataset_id='your_dataset_id' token='your_access_token'

      Upload only metadata of the images to the Lightly web solution
      > lightly-upload input_dir='path/to/image/folder' dataset_id='your_dataset_id' token='your_access_token upload='metadata'

      Upload full images to the Lightly web solution
      > lightly-upload input_dir='path/to/image/folder' dataset_id='your_dataset_id' token='your_access_token' upload='full'
    
      Upload images and embeddings to the Lightly web solution
      > lightly-upload input_dir='path/to/image/folder' embeddings='path/to/embeddings.csv' dataset_id='your_dataset_id' token='your_access_token'

      Upload embeddings to the Lightly web solution
      > lightly-upload embeddings='path/to/embeddings.csv' dataset_id='your_dataset_id' token='your_access_token'

      Download a list of files in a given tag from the Lightly web solution
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token'

      Download a list of files in a given tag without filenames from the parent tag from the Lightly web solution
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token' exclude_parent_tag=True

      Copy all files in a given tag from a source directory to a target directory
      > lightly-download tag_name='my-tag' dataset_id='your_dataset_id' token='your_access_token' input_dir='data/' output_dir='new_data/'

      == Additional Information ==

      Use self-supervised methods to understand and filter raw image data:

      Website: https://www.lightly.ai
      Documentation: https://docs.lightly.ai