Configuration Options

Run Configuration

The following configuration options are available when scheduling a run:

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

        # Enable training of the self-supervised model.
        "enable_training": False,

        # Enable pretagging. This detects objects in the input images/videos and makes
        # them available for selection. See for details.
        "pretagging": False,

        # Path to a file containing filenames to run the Lightly Worker on a subset of the
        # files in the input bucket. See
        # on how to specify relevant filenames.
        "relevant_filenames_file": "",

        # Object level workflow settings. See
        # for details.
        "object_level": {
            # Task name of the object predictions to use for the object level workflow.
            "task_name": "",

            # Padding relative to the bbox size. A padding of 0.1 will extend both width
            # and height of all bounding boxes by 10%.
            "padding": 0.0,

        # Sequence length for sequence selection on videos.
        # See for details.
        "selected_sequence_length": 1,

        # Datasource settings. See
        # for details on how to configure a datasource.
        "datasource": {
            # If False then only new samples in the datasource are processed, that were 
            # not yet processed by an earlier Lightly Worker run. Set to True to reprocess
            # all samples in the datasource.
            "process_all": False,

            # Set to False to disable uploading the selected samples to the Lightly
            # Platform. This will keep your dataset unchanged and can be useful for
            # dry-runs of the Lightly Worker.
            "enable_datapool_update": True,

            # Bypass the verification of read/write access to the datasource.
            "bypass_verify": False,

            # This feature flag enables runs which take longer than 7 days by bypassing
            # the limitation of signed read URLs of S3, GCS and Azure. The tradeoff is 
            # that it will take longer to fully read and process all the data which is 
            # stored in the bucket configured as your datasource resulting in a longer 
            # total duration. Only enable this if you are certain that your run will 
            # take longer than 7 days to complete. This feature is always enabled when 
            # a S3 datasource with delegated access is configured.
            "use_redirected_read_url": False,

        "corruptness_check": {
            # Threshold in [0, 1] which determines the sensibility of the corruptness check
            # for video frames. Every frame which has an internally computed corruptness
            # score larger than the specified threshold will be classified as corrupted.
            "corruption_threshold": 0.1,

        # Image format for selected video frames which are uploaded to the bucket.
        "output_image_format": "png",

    # Selection settings. See for details.
        # Absolute number of samples to select.
        "n_samples": None,

        # Number of samples to select relative to the number of input samples. If set to
        # 0.1 then 10% of the input samples are selected.
        "proportion_samples": None,

        # List of selection strategy configurations.
        "strategies": [
                # See on how to
                # set the input configuration.
                "input": {
                    # Input type. For example "EMBEDDINGS".
                    "type": None,

                    # Prediction task name. For example "lightly_pretagging" or "my_predictions".
                    # Only used if input type is "PREDICTIONS" or "SCORES".
                    "task": None,

                    # Active learning score name. For example "uncertainty_entropy".
                    # Only used if input type is "SCORES".
                    "score": None,

                    # Metadata key. For example "lightly.sharpness" or "weather.temperature".
                    # Only used if input type is "METADATA".
                    "key": None,

                    # Must be set to "CLASS_DISTRIBUTION" if input type is "PREDICTIONS".
                    # Otherwise unused.
                    "name": None,

                    # Dataset id from which similarity search query embeddings are loaded.
                    # Only used if input type is "EMBEDDINGS".
                    "dataset_id": None,

                    # Tag name from which similarity search query embeddings are loaded.
                    # Only used if input type is "EMBEDDINGS".
                    "tag_name": None,

                # See on how
                # to set the strategy configuration.
                "strategy": {
                    # Strategy type. For example "DIVERSITY".
                    "type": None,

                    # Minimum distance between chosen samples. For example 0.1. 
                    # Only used if strategy type is "DIVERSITY". Value should be between
                    # 0 and 2. Increasing the distance results in fewer selected samples.
                    "stopping_condition_minimum_distance": None,

                    # Selection threshold. For example 20. Only used if strategy type is
                    # "THRESHOLD".
                    "threshold": None,

                    # Threshold operation. For example "BIGGER_EQUAL". Only used if
                    # strategy type is "THRESHOLD".
                    "operation": None,

                    # Balancing target. Must be dictionary from target name to target
                    # ratio. For example {"Amulance": 0.4, "Bus": 0.6}. Only used if
                    # strategy type is "BALANCE".
                    "target": None,
        # Size of uploaded object crops and video frames. If negative, default size is 
        # used. If size is a sequence like [h, w], output size will be matched to this. 
        # If size is an int, smaller edge of the image will be matched to this number. 
        # For example, if height > width, then image will be rescaled to
        # (size * height / width, size).
        "resize": -1,

        # Dataloader Settings.
        "loader": {
            # Number of data loading worker processes. If -1, then one worker process 
            # per CPU core is created. Set to 0 to load data in the main process.
            # Set to low number to reduce memory usage at cost of slower processing.
            "num_workers": -1,

            # Batch size used by the Lightly Worker. Reduce to lower memory usage.
            # We recommend to not reduce the batch size if training is enabled.
            "batch_size": 16,

            # Whether to reshuffle data after each epoch.
            "shuffle": True,

        # Trainer Settings.
        "trainer": {
            # Number of GPUs to use for training. Set to 0 to use CPU instead.
            # Using more than one GPU is not yet supported.
            "gpus": 1,

            # Number of training epochs.
            "max_epochs": 100,

            # Floating point precision. Set to 16 for faster processing with half-precision.
            "precision": 32,

        # Model Settings.
        "model": {
            # Name of the model, currently supports popular variants:
            # resnet-18, resnet-34, resnet-50, resnet-101, resnet-152.
            "name": 'resnet-18',

            # Dimensionality of output on which self-supervised loss is calculated.
            "out_dim": 128,

            # Dimensionality of feature vectors (embedding size).
            "num_ftrs": 32,

            # Width of the resnet.
            "width": 1,

        # Training Loss Settings.
        "criterion": {
            # Temperature by which logits are divided in self-supervised loss.
            "temperature": 0.5,

        # Training Optimizser Settings.
        "optimizer": {
            # Learning rate of the optimizer.
            "lr": "1.",

            # L2 penalty.
            "weight_decay": 0.00001,

        # Training Augmentation Settings.
        "collate": {
            # Size of the input images in pixels.
            "input_size": 64,

            # Probability that color jitter is applied.           
            "cj_prob": 0.8,

            # How much to jitter brightness.
            "cj_bright": 0.7,

            # How much to jitter contrast.
            "cj_contrast": 0.7,

            # How much to jitter saturation.
            "cj_sat": 0.7,

            # How much to jitter hue.
            "cj_hue": 0.2,

            # Minimum size of random crop relative to input_size.
            "min_scale": 0.15,

            # Probability that image is converted to grayscale.
            "random_gray_scale": 0.2,

            # Probability that gaussian blur is applied.
            "gaussian_blur": 0.5,

            # Kernel size of gaussian blur relative to input_size.
            "kernel_size": 0.1,

            # Probability that vertical flip is applied.
            "vf_prob": 0.0,

            # Probability that horizontal flip is applied.
            "hf_prob": 0.5,

            # Probability that random (+-90 degree) rotation is applied.
            "rr_prob": 0.0,

        "checkpoint_callback": {
            # If True, then the checkpoint from the last epoch is saved.
            "save_last": True,

        # Random Seed.
        "seed": 1,

Lightly Worker Start Configuration

The following configuration options can be passed when starting the Lightly Worker docker image:

docker run --shm-size="1024m" --gpus all --rm -it \
    lightly/worker:latest \
    worker.worker_id={MY_WORKER_ID} \
    worker.force_start=True \
  • worker.worker_id: Must be set to an existing worker id. See Install Lightly on how to get a worker id.
  • worker.force_start: If True, the worker notifies that it is online even if another worker with the same worker_id is already online. This can be useful if the other worker is actually offline but was not able to properly shut down. If False, the new worker will not start if another worker with the same id already exists.
  • sanity_check: Set to True to verify the installation of the Lightly Worker. The worker shuts down once the installation is verified. See Sanity Check for more information.