Configuration Options
Run Configuration
The following configuration options are available when scheduling a run:
from lightly.api import ApiWorkflowClient
# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
client.schedule_compute_worker_run(
worker_config={
# Enable training of the self-supervised model.
"enable_training": False,
"training": {
# Prediction task name. For example "lightly_pretagging" or "my_predictions".
# Set to train the embedding model on object crops. Must be an object detection or
# keypoint detection task.
"task_name": "",
},
# Reuse a self-supervised learning checkpoint of a previous run by providing the run_id.
# Go to the Lightly Platform runs overview: https://app.lightly.ai/compute/runs
# and click on the run whose checkpoint you want to reuse.
# It must have the checkpoint available as an artifact.
# Then use the run_id of that run.
#
# For more details: see https://docs.lightly.ai/docs/train-a-self-supervised-model#use-checkpoints-from-previous-runs
# This option is incompatible with the 'checkpoint' option.
# Requires Lightly Worker v2.9.2+ and Lightly Python Client v1.4.19+
"checkpoint_run_id": "",
# WARNING: We recommend using the checkpoint_run_id option above instead.
#
# Provide a checkpoint url for the self-supervised model.
# For example "https://my-checkpoint-read-url".
# The URL must be a normal URL, e.g. pasting it in any browser must download the checkpoint.
#
# Only when using local storage, the parameter can also point to a local checkpoint file.
# In that case, it must be a path relative to the .lightly directory in the Lightly datasource.
# This option is incompatible with the 'checkpoint_run_id' option.
"checkpoint": "",
# Enable pretagging. This detects objects in the input images/videos and makes
# them available for selection. See https://docs.lightly.ai/docs/lightly-pretagging for details.
"pretagging": False,
# Path to a file containing filenames to run the Lightly Worker on a subset of the
# files in the input datasource. See https://docs.lightly.ai/docs/relevant-filenames
# on how to specify relevant filenames.
"relevant_filenames_file": "",
# Sequence length for sequence selection on videos.
# See https://docs.lightly.ai/docs/sequence-selection for details.
"selected_sequence_length": 1,
# Datasource settings. See https://docs.lightly.ai/docs/input-files-and-folder-struture
# for details on how to configure a datasource.
"datasource": {
# If False then only new samples in the datasource are processed, that were
# not yet processed by an earlier Lightly Worker run. Set to True to reprocess
# all samples in the datasource.
"process_all": False,
# Set to False to disable uploading the selected samples to the Lightly
# Platform. This will keep your dataset unchanged and can be useful for
# dry-runs of the Lightly Worker.
"enable_datapool_update": True,
# Bypass the verification of read/write access to the datasource.
# Set this to True if you are using restrictive policies for your cloud bucket.
# See https://lightly-docs.readme.io/docs/set-up-your-first-dataset#verify-datasource-access
# for details.
"bypass_verify": False,
# Input expiration settings. Only applied to input datasources with a
# bucket lifecycle/retention policy.
# This option currently only supports AWS S3 buckets.
# See https://docs.lightly.ai/docs/cloud-storage#retention-policy for more info.
"input_expiration": {
# Defines the minimum number of days the input images or videos must continue
# existing in the bucket after the Lightly Worker run is started. Inputs that
# expire before this period are handled based on the handling_strategy option
# (see below).
# Requires that the handling_strategy option is set.
"min_days_to_expiration": None,
# What to do if an input is encountered that expires before the
# min_days_to_expiration period (see above). Valid strategies are "SKIP"
# and "ABORT". "SKIP" marks expiring inputs as corrupt and does not process them.
# "ABORT" stops the Lightly Worker run if an expiring input is encountered.
# Requires that the min_days_to_expiration option is set.
"handling_strategy": None,
},
},
# Maximum cache size in bytes for saving lightweight data such as metadata and
# prediction on disk to speedup the worker during a run. By default, 20% of free disk space
# or a maximum of 100GB, whichever is smaller, is used. Caching can be deactivated
# by setting the value to <= 0.
"cache_size": None,
# Image format for selected video frames that are uploaded to the bucket.
"output_image_format": "png",
# Number of data loading processes. If -1, then one process per CPU core
# is created. Set to 0 to load data in the main process. Set to low number
# to reduce memory usage at cost of slower processing.
"num_processes": -1,
# Number of data loading threads. If -1, then two threads per CPU core
# are created. Is always at least one.
"num_threads": -1,
# Path to a file containing custom embeddings for the images in your input datasource.
# The file must be stored in the .lightly/embeddings/ directory in your Lightly
# datasource. The path in the config must be relative to the .lightly/embeddings
# directory. See https://docs.lightly.ai/docs/custom-embeddings for details.
"embeddings": "",
# Whether the worker should shut down after processing this job. If False it will
# continue running and listen for new available jobs.
"shutdown_when_job_finished": False,
},
# Selection settings. See https://docs.lightly.ai/docs/selection for details.
selection_config={
# Absolute number of samples to select. When using a datapool, n_samples additional
# samples will be added to the datapool independent of the datapool size.
"n_samples": None,
# Number of samples to select relative to the number of input samples. If set to
# 0.1 then 10% of the input samples are selected. When using a datapool, proportion_samples
# of the new input samples will be added to the datapool independent of the datapool size.
"proportion_samples": None,
# List of selection strategy configurations.
"strategies": [
{
# See https://docs.lightly.ai/docs/selection#selection-input on how to
# set the input configuration.
"input": {
# Input type. For example "EMBEDDINGS".
"type": None,
# Prediction task name. For example "lightly_pretagging" or "my_predictions".
# Only used if input type is "EMBEDDINGS", "PREDICTIONS" or "SCORES".
"task": None,
# Active learning score name. For example "uncertainty_entropy".
# Only used if input type is "SCORES".
"score": None,
# Metadata key. For example "lightly.sharpness" or "weather.temperature".
# Only used if input type is "METADATA".
"key": None,
# Must be set to "CLASS_DISTRIBUTION" if input type is "PREDICTIONS".
# Otherwise unused.
"name": None,
# Dataset id from which similarity search query embeddings are loaded.
# Only used if input type is "EMBEDDINGS".
"dataset_id": None,
# Tag name from which similarity search query embeddings are loaded.
# Only used if input type is "EMBEDDINGS".
"tag_name": None,
},
# See https://docs.lightly.ai/docs/selection#selection-strategy on how
# to set the strategy configuration.
"strategy": {
# Strategy type. For example "DIVERSITY".
"type": None,
# Minimum distance between chosen samples. For example 0.1.
# Only used if strategy type is "DIVERSITY". Value should be between
# 0 and 2. Increasing the distance results in fewer selected samples.
"stopping_condition_minimum_distance": None,
# Selection threshold. For example 20. Only used if strategy type is
# "THRESHOLD".
"threshold": None,
# Threshold operation. For example "BIGGER_EQUAL". Only used if
# strategy type is "THRESHOLD".
"operation": None,
# Balancing target. Must be dictionary from target name to target
# ratio. For example {"Amulance": 0.4, "Bus": 0.6}. Only used if
# strategy type is "BALANCE".
"target": None,
# Strength of this strategy relative to other strategies. Value must
# be in [-1e9, 1e9].
"strength": 1.0,
},
},
],
},
lightly_config={
# Dataloader Settings.
"loader": {
# The number of processes and number of threads to use for data loading.
# This is deprecated, please use "num_processes" and "num_threads" instead.
"num_workers": -1,
# Batch size used by the Lightly Worker. Reduce to lower memory usage.
# We recommend to not reduce the batch size if training is enabled.
"batch_size": 16,
# Whether to reshuffle data after each epoch.
"shuffle": True,
},
# Trainer Settings.
"trainer": {
# Number of GPUs to use for training. Set to 0 to use CPU instead.
# Using more than one GPU is not yet supported.
"gpus": 1,
# Number of training epochs.
"max_epochs": 100,
# Floating point precision. Set to 16 for faster processing with half-precision.
"precision": 32,
},
# Model Settings.
"model": {
# Name of the model, currently supports popular variants:
# resnet-18, resnet-34, resnet-50, resnet-101, resnet-152.
"name": 'resnet-18',
# Dimensionality of output on which self-supervised loss is calculated.
"out_dim": 128,
# Dimensionality of feature vectors (embedding size).
"num_ftrs": 32,
# Width of the resnet.
"width": 1,
},
# Training Loss Settings.
"criterion": {
# Temperature by which logits are divided in self-supervised loss.
"temperature": 0.5,
},
# Training Optimizser Settings.
"optimizer": {
# Learning rate of the optimizer.
"lr": "1.",
# L2 penalty.
"weight_decay": 0.00001,
},
# Training Augmentation Settings.
"collate": {
# Size of the input images in pixels.
"input_size": 64,
# Probability that color jitter is applied.
"cj_prob": 0.8,
# How much to jitter brightness.
"cj_bright": 0.7,
# How much to jitter contrast.
"cj_contrast": 0.7,
# How much to jitter saturation.
"cj_sat": 0.7,
# How much to jitter hue.
"cj_hue": 0.2,
# Minimum size of random crop relative to input_size.
"min_scale": 0.15,
# Probability that image is converted to grayscale.
"random_gray_scale": 0.2,
# Probability that gaussian blur is applied.
"gaussian_blur": 0.5,
# Kernel size of gaussian blur relative to input_size.
"kernel_size": 0.1,
# Probability that vertical flip is applied.
"vf_prob": 0.0,
# Probability that horizontal flip is applied.
"hf_prob": 0.5,
# Probability that random rotation is applied.
"rr_prob": 0.0,
# Range of degrees to select from for random rotation.
# If rr_degrees is None, images are rotated by 90 degrees.
# If rr_degrees is a [min, max] list, images are rotated
# by a random angle in [min, max]. All rotations are counter-clockwise.
"rr_degrees": None,
},
"checkpoint_callback": {
# If True, the checkpoint from the last epoch is saved.
"save_last": True,
},
# Random Seed.
"seed": 1,
}
Lightly Worker Start Configuration
The following configuration options can be passed when starting the Lightly Worker docker image:
docker run --shm-size="1024m" --gpus all --rm -it \
-e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
-e LIGHTLY_WORKER_ID={MY_WORKER_ID} \
-e LIGHTLY_UID={MY_USER_ID} \
-e LIGHTLY_GID={MY_GROUP_ID} \
-e LIGHTLY_CA_CERTS='/etc/ssl/certs/mycert.crt' \
-e HTTPS_PROXY='https://user:password@proxyIP:proxyPort' \
lightly/worker:latest \
worker.force_start=True \
sanity_check=False
LIGHTLY_WORKER_ID
: See Install Lightly on how to get a worker id.LIGHTLY_UID
/LIGHTLY_GID
: See running the Lightly Worker with a custom user and group.LIGHTLY_CA_CERTS
/HTTPS_PROXY
: See running the Lightly Worker behind a proxy.worker.force_start
: IfTrue
, the worker notifies that it is online even if another worker with the sameworker_id
is already online. This can be useful if the other worker is actually offline but was not able to properly shut down. IfFalse
, the new worker will not start if another worker with the same id already exists.sanity_check
: Set toTrue
to verify the installation of the Lightly Worker. The worker shuts down once the installation is verified. See Sanity Check for more information.
Updated 6 days ago