Configuration Options

Run Configuration

The following configuration options are available when scheduling a run:

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

client.schedule_compute_worker_run(
    worker_config={
        # Enable training of the self-supervised model.
        "enable_training": False,

        # Provide a checkpoint for the self-supervised model.
        "checkpoint": "https://my-checkpoint-read-url"

        # Enable pretagging. This detects objects in the input images/videos and makes
        # them available for selection. See https://docs.lightly.ai/docs/lightly-pretagging for details.
        "pretagging": False,

        # Path to a file containing filenames to run the Lightly Worker on a subset of the
        # files in the input bucket. See https://docs.lightly.ai/docs/relevant-filenames
        # on how to specify relevant filenames.
        "relevant_filenames_file": "",

        # Sequence length for sequence selection on videos.
        # See https://docs.lightly.ai/docs/sequence-selection for details.
        "selected_sequence_length": 1,

        # Datasource settings. See https://docs.lightly.ai/docs/input-files-and-folder-struture
        # for details on how to configure a datasource.
        "datasource": {
            # If False then only new samples in the datasource are processed, that were 
            # not yet processed by an earlier Lightly Worker run. Set to True to reprocess
            # all samples in the datasource.
            "process_all": False,

            # Set to False to disable uploading the selected samples to the Lightly
            # Platform. This will keep your dataset unchanged and can be useful for
            # dry-runs of the Lightly Worker.
            "enable_datapool_update": True,

            # Bypass the verification of read/write access to the datasource.
            "bypass_verify": False,
        },

        "corruptness_check": {
            # Threshold in [0, 1] which determines the sensibility of the corruptness check
            # for video frames. Every frame which has an internally computed corruptness
            # score larger than the specified threshold will be classified as corrupted.
            "corruption_threshold": 0.1,
        },

        # Image format for selected video frames that are uploaded to the bucket.
        "output_image_format": "png",

        # Path to a file containing custom embeddings for the images in your Input datasource.
        # The file must be stored in the .lightly/embeddings/ directory in your Lightly
        # datasource. The path in the config must be relative to the .lightly/embeddings
        # directory. See https://docs.lightly.ai/docs/custom-embeddings for details.
        "embeddings": "",
    },

    # Selection settings. See https://docs.lightly.ai/docs/selection for details.
    selection_config={
        # Absolute number of samples to select.
        "n_samples": None,

        # Number of samples to select relative to the number of input samples. If set to
        # 0.1 then 10% of the input samples are selected.
        "proportion_samples": None,

        # List of selection strategy configurations.
        "strategies": [
            {
                # See https://docs.lightly.ai/docs/selection#selection-input on how to
                # set the input configuration.
                "input": {
                    # Input type. For example "EMBEDDINGS".
                    "type": None,

                    # Prediction task name. For example "lightly_pretagging" or "my_predictions".
                    # Only used if input type is "EMBEDDINGS", "PREDICTIONS" or "SCORES".
                    "task": None,

                    # Active learning score name. For example "uncertainty_entropy".
                    # Only used if input type is "SCORES".
                    "score": None,

                    # Metadata key. For example "lightly.sharpness" or "weather.temperature".
                    # Only used if input type is "METADATA".
                    "key": None,

                    # Must be set to "CLASS_DISTRIBUTION" if input type is "PREDICTIONS".
                    # Otherwise unused.
                    "name": None,

                    # Dataset id from which similarity search query embeddings are loaded.
                    # Only used if input type is "EMBEDDINGS".
                    "dataset_id": None,

                    # Tag name from which similarity search query embeddings are loaded.
                    # Only used if input type is "EMBEDDINGS".
                    "tag_name": None,
                },

                # See https://docs.lightly.ai/docs/selection#selection-strategy on how
                # to set the strategy configuration.
                "strategy": {
                    # Strategy type. For example "DIVERSITY".
                    "type": None,

                    # Minimum distance between chosen samples. For example 0.1. 
                    # Only used if strategy type is "DIVERSITY". Value should be between
                    # 0 and 2. Increasing the distance results in fewer selected samples.
                    "stopping_condition_minimum_distance": None,

                    # Selection threshold. For example 20. Only used if strategy type is
                    # "THRESHOLD".
                    "threshold": None,

                    # Threshold operation. For example "BIGGER_EQUAL". Only used if
                    # strategy type is "THRESHOLD".
                    "operation": None,

                    # Balancing target. Must be dictionary from target name to target
                    # ratio. For example {"Amulance": 0.4, "Bus": 0.6}. Only used if
                    # strategy type is "BALANCE".
                    "target": None,
                },
            },
        ],
    },
    lightly_config={
        # Size of uploaded object crops and video frames. If negative, default size is 
        # used. If size is a sequence like [h, w], output size will be matched to this. 
        # If size is an int, smaller edge of the image will be matched to this number. 
        # For example, if height > width, then image will be rescaled to
        # (size * height / width, size).
        "resize": -1,

        # Dataloader Settings.
        "loader": {
            # Number of data loading worker processes. If -1, then one worker process 
            # per CPU core is created. Set to 0 to load data in the main process.
            # Set to low number to reduce memory usage at cost of slower processing.
            "num_workers": -1,

            # Batch size used by the Lightly Worker. Reduce to lower memory usage.
            # We recommend to not reduce the batch size if training is enabled.
            "batch_size": 16,

            # Whether to reshuffle data after each epoch.
            "shuffle": True,
        },

        # Trainer Settings.
        "trainer": {
            # Number of GPUs to use for training. Set to 0 to use CPU instead.
            # Using more than one GPU is not yet supported.
            "gpus": 1,

            # Number of training epochs.
            "max_epochs": 100,

            # Floating point precision. Set to 16 for faster processing with half-precision.
            "precision": 32,
        },

        # Model Settings.
        "model": {
            # Name of the model, currently supports popular variants:
            # resnet-18, resnet-34, resnet-50, resnet-101, resnet-152.
            "name": 'resnet-18',

            # Dimensionality of output on which self-supervised loss is calculated.
            "out_dim": 128,

            # Dimensionality of feature vectors (embedding size).
            "num_ftrs": 32,

            # Width of the resnet.
            "width": 1,
        },

        # Training Loss Settings.
        "criterion": {
            # Temperature by which logits are divided in self-supervised loss.
            "temperature": 0.5,
        },

        # Training Optimizser Settings.
        "optimizer": {
            # Learning rate of the optimizer.
            "lr": "1.",

            # L2 penalty.
            "weight_decay": 0.00001,
        },

        # Training Augmentation Settings.
        "collate": {
            # Size of the input images in pixels.
            "input_size": 64,

            # Probability that color jitter is applied.           
            "cj_prob": 0.8,

            # How much to jitter brightness.
            "cj_bright": 0.7,

            # How much to jitter contrast.
            "cj_contrast": 0.7,

            # How much to jitter saturation.
            "cj_sat": 0.7,

            # How much to jitter hue.
            "cj_hue": 0.2,

            # Minimum size of random crop relative to input_size.
            "min_scale": 0.15,

            # Probability that image is converted to grayscale.
            "random_gray_scale": 0.2,

            # Probability that gaussian blur is applied.
            "gaussian_blur": 0.5,

            # Kernel size of gaussian blur relative to input_size.
            "kernel_size": 0.1,

            # Probability that vertical flip is applied.
            "vf_prob": 0.0,

            # Probability that horizontal flip is applied.
            "hf_prob": 0.5,

            # Probability that random rotation is applied.
            "rr_prob": 0.0,

            # Range of degrees to select from for random rotation.
            # If rr_degrees is None, images are rotated by 90 degrees.
            # If rr_degrees is a [min, max] list, images are rotated
            # by a random angle in [min, max]. If rr_degrees is a
            # single number, images are rotated by a random angle in
            # [-rr_degrees, +rr_degrees]. All rotations are counter-clockwise.
            "rr_degrees": None,
        },

        "checkpoint_callback": {
            # If True, then the checkpoint from the last epoch is saved.
            "save_last": True,
        },

        # Random Seed.
        "seed": 1,
    }
)

Lightly Worker Start Configuration

The following configuration options can be passed when starting the Lightly Worker docker image:

docker run --shm-size="1024m" --gpus all --rm -it \
    -e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
    -e LIGHTLY_WORKER_ID={MY_WORKER_ID} \
    lightly/worker:latest \
    worker.force_start=True \
    sanity_check=False
  • worker_id: See Install Lightly on how to get a worker id. Alternatively, the worker id can be provided as LIGHTLY_WORKER_ID environment variable.
  • worker.force_start: If True, the worker notifies that it is online even if another worker with the same worker_id is already online. This can be useful if the other worker is actually offline but was not able to properly shut down. If False, the new worker will not start if another worker with the same id already exists.
  • sanity_check: Set to True to verify the installation of the Lightly Worker. The worker shuts down once the installation is verified. See Sanity Check for more information.