Configuring Image Augmentations

Pretraining relies strongly on image augmentations such as:

  • Random Cropping and Resizing: Crops random parts of images and resizes them to fixed resolutions.

  • Random Horizontal and Vertical Flipping: Mirrors images across horizontal or vertical axes.

  • Random Rotation: Rotates images by random angles.

  • Color Jittering: Randomly modifies brightness, contrast, saturation, and hue.

  • Random Grayscaling: Converts images to grayscale with certain probability.

  • Gaussian Blurring: Applies Gaussian blur filter of random \(\sigma\), smoothing the image.

  • Random Solarization: Inverts pixel values above a random threshold.

  • Normalization: Scales pixel values using predefined mean and standard deviation.

While the default settings in LightlyTrain should work well for most use cases, for some downstream tasks and image domains it might be beneficial to override the defaults and adjust the applied augmentations. This can be done as follows:

For the Python API, use a dictionary structure to override any augmentations settings and pass it to the lightly_train.train function through the transform_args argument. Many augmentations can also be selectively turned off completely by setting them to None, as is demonstrated in this example with the color_jitter augmentation.

import lightly_train
my_transform_args = {
    "random_resize": {
        "min_scale": 0.1
    },
    "image_size": (128, 128),
    "color_jitter": None,
}
if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment",            # Output directory
        data="my_data_dir",                 # Directory with images
        model="torchvision/resnet18",       # Model to train
        transform_args=my_transform_args,   # Overrides of default augmentation parameters
    )

There are two options on how you can configure the augmentations on the command line:

  1. Dotted Notation

  2. Pass all arguments as a single JSON structure

Important

Make sure that any values that you pass through the command line are JSON-compatible. This means:

  • Strings inside JSON structures must have double quotes (wrap the whole structure by single quotes).

  • Tuples do not exist, use bracketed notation (like a Python list).

  • JSON’s correspondence to Python’s None is null, which you will have to use in order to selectively turn off an augmentation.

An example of how you can use the bracketed notation, would be:

lightly-train train \
    out="out/my_experiment" \
    overwrite=True \
    data="my_data_dir" \
    model="torchvision/resnet18" \
    transform_args.image_size="[128,128]" \
    transform_args.random_resize.min_scale=0.1 \
    transform_args.color_jitter=null

And an example of using a single JSON structure would look as follows:

lightly-train train \
    out="out/my_experiment" \
    data="my_data_dir" model="torchvision/resnet18" \
    transform_args='{"image_size": [128, 128], "random_resize": {"min_scale": 0.1}, "color_jitter": null}'

The next sections will cover which arguments are available across all methods, and also the arguments unique to specific methods.

See also

Interested in the default augmentation settings for each method? Check the method pages:

Arguments available for all methods

The following arguments are available for all methods Distillation (recommended 🚀), DINO and SimCLR.

Random Cropping and Resizing

Can be disabled by setting to None.

"random_resize": {
    "min_scale": float,
    "max_scale": float,
}

Image Size

Cannot be disabled, required for all transforms.

"image_size": tuple[int, int]  # height, width

Random Horizontal and Vertical Flipping

Can be disabled by setting to None.

"random_flip": {
    "horizontal_prob": float, # probability of applying horizontal flip
    "vertical_prob": float,   # probability of applying vertical flip
}

Random Rotation

Can be disabled by setting to None.

"random_rotation": {
    "prob": float,   # probability of applying rotation
    "degrees": int,  # maximum rotation angle in degrees
}

Color Jittering

Can be disabled by setting to None.

"color_jitter": {
    "prob": float,       # probability of applying color jitter
    "strength": float,   # multiplier for all parameters below
    "brightness": float, # how much to jitter brightness (non-negative)
    "contrast": float,   # how much to jitter contrast (non-negative)
    "saturation": float, # how much to jitter saturation (non-negative)
    "hue": float,        # how much to jitter hue (non-negative)
}

Random Grayscaling

Can be disabled by setting to None.

"random_gray_scale": float  # probability of converting to grayscale

Gaussian Blurring

Can be disabled by setting to None.

"gaussian_blur": {
    "prob": float,                     # probability of applying blur
    "sigmas": tuple[float, float],          # range of sigma values
    "blur_limit": int | tuple[int, int],    # range of kernel size, either [0, high] or [low, high]
}

Random Solarization

Can be disabled by setting to None.

"solarize": {
    "prob": float,      # probability of applying solarization
    "threshold": float  # threshold value in range [0, 1]
}

Normalization

Cannot be disabled, required for all transforms.

"normalize": {
    "mean": tuple[float, float, float],  # means of the three channels
    "std": tuple[float, float, float]    # standard deviations of the three channels
}

Arguments unique to methods

The methods Distillation and SimCLR have no transform configuration options beyond the globally available ones, which were listed above.

DINO

DINO uses a multi-crop strategy with two full-resolution “global” views (which have slightly different augmentation parameters) and optional additional smaller resolution “local” views (default: 6 views).

Besides the default arguments, the following DINO-specific arguments are available. Note that local_view itself can be disabled by setting it to None. Additionally, some augmentations within these structures can be disabled by setting them to None:

"global_view_1": {                     # modifications for second global view (cannot be disabled)
    "gaussian_blur": {                 # can be disabled by setting to None
        "prob": float,                 
        "sigmas": tuple[float, float],
        "blur_limit": int | tuple[int, int]
    },
    "solarize": {                      # can be disabled by setting to None
        "prob": float,
        "threshold": float
    }
},
"local_view": {                        # configuration for local views (can be disabled by setting to None)
    "num_views": int,                  # number of local views to generate
    "view_size": tuple[int, int],      # size of local views
    "random_resize": {                 # can be disabled by setting to None
        "min_scale": float,
        "max_scale": float
    },
    "gaussian_blur": {                 # can be disabled by setting to None
        "prob": float,
        "sigmas": tuple[float, float],
        "blur_limit": int | tuple[int, int]
    }
}