Corruptness Check Information

The corruptness check information artifact is a JSON file with two entries: corrupt_samples and corrupt_images. The corrupt_samples is an object of key-value pairs, where each key corresponds to the filename of an image or frame and each value denotes the reason why this sample was flagged as corrupt by the Lightly Worker. Similarly, the corrupt_videos maps video filenames to corruptions.

In the following example two frames from video_1.mp4 were flagged as corrupt. One of them raised an OSError when the Lightly Worker tried to load it and the other one was flagged because of internal checks (see Corruptness Check to learn more). Additionally, the videos video_2.mp4 and video_3.mp4 raised an exception and are therefore flagged as corrupted, too. This means that the Lightly Worker was not able to load the video at all.

    "corrupt_samples": {
        "video_1_mp4_frame_0.jpg": "OSError",
        "video_1_mp4_frame_2.jpg": "built-in",
    "corrupt_videos": {
        "video_2.mp4": "Exception",
        "video_3.mp4": "Exception",

You can download the corruptness check information for a scheduled run like this:

from lightly.api import ApiWorkflowClient

# Create the Lightly client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")

scheduled_run_id = client.schedule_compute_worker_run(...)

# Download the corruptness check information as json file from a scheduled run
run = client.get_compute_worker_run_from_scheduled_run(
    run=run, output_path="my_run/artifacts/corruptness_check_information.json"

# Download the corruptness check information as json file from a dataset_id
runs = client.get_compute_worker_runs(dataset_id=client.dataset_id)
run = runs[-1]  # get the latest run
    run=run, output_path="my_run/artifacts/corruptness_check_information.json"