ImageNet
Warning
The Docker Archive documentation is deprecated
The old workflow described in these docs will not be supported with new Lightly Worker versions above 2.6. Please switch to our new documentation page instead.
Let’s have a look at how to run the docker container to analyze and filter the famous ImageNet dataset. You can reproduce the sample report using the following command.
docker run --gpus all --rm -it \
-v /datasets/imagenet/train/:/home/input_dir:ro \
-v /datasets/docker_imagenet_500k:/home/output_dir \
--ipc="host" \
lightly/worker:latest \
token=MYAWESOMETOKEN \
lightly.collate.input_size=64 \
lightly.loader.batch_size=256 \
lightly.loader.num_workers=8 \
lightly.trainer.max_epochs=0 \
stopping_condition.n_samples=500000 \
remove_exact_duplicates=True \
enable_corruptness_check=False
The complete processing time was 04h 37m 02s. The machine used for this experiment is a cloud instance with 8 cores, 30GB of RAM, and a V100 GPU. The dataset was stored on an SSD drive.
You can also use the direct link for the ImageNet report.
Combining Cityscapes with Kitti
Using Lightly Docker and the datapool feature we can combine two datasets and ensure that we only keep the unique samples.
docker run --shm-size="512m" --gpus all --rm -it \
-v /datasets/cityscapes/leftImg8bit/train/:/home/input_dir:ro \
-v /datasets/docker_out_cityscapes:/home/output_dir \
-v /datasets/docker_out_cityscapes:/home/shared_dir \
-e --ipc="host" --network="host" lightly/worker:latest \
token=MYAWESOMETOKEN lightly.loader.num_workers=8 \
stopping_condition.min_distance=0.2 remove_exact_duplicates=True \
enable_corruptness_check=False enable_training=True \
lightly.trainer.max_epochs=20 lightly.optimizer.lr=1.0 \
lightly.trainer.precision=32 lightly.loader.batch_size=256 \
lightly.collate.input_size=64 datapool.name=autonomous_driving
The report for running the command can be found here:
Cityscapes.pdf
Since the Cityscapes dataset has subfolders for the different cities Lightly Docker uses them as weak labels for the embedding plot as shown below.
Now we can use the datapool and pre-trained model to select the interesting frames from Kitti and add them to Cityscapes:
docker run --shm-size="512m" --gpus all --rm -it \
-v /datasets/kitti/training/image_2/:/home/input_dir:ro \
-v /datasets/docker_out_cityscapes:/home/output_dir \
-v /datasets/docker_out_cityscapes:/home/shared_dir \
-e --ipc="host" --network="host" lightly/worker:latest \
token=MYAWESOMETOKEN lightly.loader.num_workers=8 \
stopping_condition.min_distance=0.2 remove_exact_duplicates=True \
enable_corruptness_check=False enable_training=False \
lightly.trainer.max_epochs=20 lightly.optimizer.lr=1.0 \
lightly.trainer.precision=32 lightly.loader.batch_size=256 \
lightly.collate.input_size=64 datapool.name=autonomous_driving
We will end up with new plots in the report due to the datapool. The plots show the embeddings and highlight with blue color the samples which have been added from the new dataset. In our experiment, we see that Lighlty Docker added several new samples outside of the previous embedding distribution. This is great, since it shows that Cityscapes and Kitti have different data and we can combine the two datasets.
The report for running the command can be found here:
kitti_with_min_distance=0.2.pdf
And the report for stopping condition mininum distance of 0.05:
kitti_with_min_distance=0.05.pdf