Configuring Image Augmentations¶
Pretraining relies strongly on image augmentations such as:
Random Cropping and Resizing: Crops random parts of images and resizes them to fixed resolutions.
Random Horizontal and Vertical Flipping: Mirrors images across horizontal or vertical axes.
Random Rotation: Rotates images by random angles.
Color Jittering: Randomly modifies brightness, contrast, saturation, and hue.
Random Grayscaling: Converts images to grayscale with certain probability.
Gaussian Blurring: Applies Gaussian blur filter of random \(\sigma\), smoothing the image.
Random Solarization: Inverts pixel values above a random threshold.
Normalization: Scales pixel values using predefined mean and standard deviation.
While the default settings in LightlyTrain should work well for most use cases, for some downstream tasks and image domains it might be beneficial to override the defaults and adjust the applied augmentations. This can be done as follows:
For the Python API, use a dictionary structure to override any augmentations settings and pass it to the lightly_train.train
function through the transform_args
argument. Many augmentations can also be selectively turned off completely by setting them to None
, as is demonstrated in this example with the color_jitter
augmentation.
import lightly_train
my_transform_args = {
"random_resize": {
"min_scale": 0.1
},
"image_size": (128, 128),
"color_jitter": None,
}
if __name__ == "__main__":
lightly_train.train(
out="out/my_experiment", # Output directory
data="my_data_dir", # Directory with images
model="torchvision/resnet18", # Model to train
transform_args=my_transform_args, # Overrides of default augmentation parameters
)
There are two options on how you can configure the augmentations on the command line:
Dotted Notation
Pass all arguments as a single JSON structure
Important
Make sure that any values that you pass through the command line are JSON-compatible. This means:
Strings inside JSON structures must have double quotes (wrap the whole structure by single quotes).
Tuples do not exist, use bracketed notation (like a Python list).
JSON’s correspondence to Python’s
None
isnull
, which you will have to use in order to selectively turn off an augmentation.
An example of how you can use the bracketed notation, would be:
lightly-train train \
out="out/my_experiment" \
overwrite=True \
data="my_data_dir" \
model="torchvision/resnet18" \
transform_args.image_size="[128,128]" \
transform_args.random_resize.min_scale=0.1 \
transform_args.color_jitter=null
And an example of using a single JSON structure would look as follows:
lightly-train train \
out="out/my_experiment" \
data="my_data_dir" model="torchvision/resnet18" \
transform_args='{"image_size": [128, 128], "random_resize": {"min_scale": 0.1}, "color_jitter": null}'
The next sections will cover which arguments are available across all methods, and also the arguments unique to specific methods.
See also
Interested in the default augmentation settings for each method? Check the method pages:
Arguments available for all methods¶
The following arguments are available for all methods Distillation (recommended 🚀), DINO and SimCLR.
Random Cropping and Resizing¶
Can be disabled by setting to None
.
"random_resize": {
"min_scale": float,
"max_scale": float,
}
Image Size¶
Cannot be disabled, required for all transforms.
"image_size": tuple[int, int] # height, width
Random Horizontal and Vertical Flipping¶
Can be disabled by setting to None
.
"random_flip": {
"horizontal_prob": float, # probability of applying horizontal flip
"vertical_prob": float, # probability of applying vertical flip
}
Random Rotation¶
Can be disabled by setting to None
.
"random_rotation": {
"prob": float, # probability of applying rotation
"degrees": int, # maximum rotation angle in degrees
}
Color Jittering¶
Can be disabled by setting to None
.
"color_jitter": {
"prob": float, # probability of applying color jitter
"strength": float, # multiplier for all parameters below
"brightness": float, # how much to jitter brightness (non-negative)
"contrast": float, # how much to jitter contrast (non-negative)
"saturation": float, # how much to jitter saturation (non-negative)
"hue": float, # how much to jitter hue (non-negative)
}
Random Grayscaling¶
Can be disabled by setting to None
.
"random_gray_scale": float # probability of converting to grayscale
Gaussian Blurring¶
Can be disabled by setting to None
.
"gaussian_blur": {
"prob": float, # probability of applying blur
"sigmas": tuple[float, float], # range of sigma values
"blur_limit": int | tuple[int, int], # range of kernel size, either [0, high] or [low, high]
}
Random Solarization¶
Can be disabled by setting to None
.
"solarize": {
"prob": float, # probability of applying solarization
"threshold": float # threshold value in range [0, 1]
}
Normalization¶
Cannot be disabled, required for all transforms.
"normalize": {
"mean": tuple[float, float, float], # means of the three channels
"std": tuple[float, float, float] # standard deviations of the three channels
}
Arguments unique to methods¶
The methods Distillation and SimCLR have no transform configuration options beyond the globally available ones, which were listed above.
DINO¶
DINO uses a multi-crop strategy with two full-resolution “global” views (which have slightly different augmentation parameters) and optional additional smaller resolution “local” views (default: 6 views).
Besides the default arguments, the following DINO-specific arguments are available. Note that local_view
itself can be disabled by setting it to None
. Additionally, some augmentations within these structures can be disabled by setting them to None
:
"global_view_1": { # modifications for second global view (cannot be disabled)
"gaussian_blur": { # can be disabled by setting to None
"prob": float,
"sigmas": tuple[float, float],
"blur_limit": int | tuple[int, int]
},
"solarize": { # can be disabled by setting to None
"prob": float,
"threshold": float
}
},
"local_view": { # configuration for local views (can be disabled by setting to None)
"num_views": int, # number of local views to generate
"view_size": tuple[int, int], # size of local views
"random_resize": { # can be disabled by setting to None
"min_scale": float,
"max_scale": float
},
"gaussian_blur": { # can be disabled by setting to None
"prob": float,
"sigmas": tuple[float, float],
"blur_limit": int | tuple[int, int]
}
}