Docker¶
LightlyTrain is available as a Docker image on Docker Hub for containerized deployment.
Installation¶
Install Docker with NVIDIA GPU support (docs). Then, pull the latest version of the LightlyTrain Docker image from Docker Hub:
docker pull lightly/train:latest
You can verify that the image is working correctly by running the following command:
docker run --rm --gpus=all lightly/train:latest lightly-train --help
This should print the LightlyTrain help message.
Usage¶
Start a LightlyTrain Docker container in interactive mode:
docker run -it --rm --gpus=all --shm-size=4gb --user $(id -u):$(id -g) -v /my_output_dir:/out -v /my_data_dir:/data lightly/train:latest
Flags:
-it
: Starts the container in interactive mode.--rm
: Removes the container after it has been stopped.--gpus=all
: Enables GPU support.--shm-size=4gb
: Sets the shared memory size to 4 GB. Increase this for large datasets.--user $(id -u):$(id -g)
: Run the container with the same user as the host. This makes sure that all files created by the container (checkpoints, logs, etc.) have the same permissions as the user running the container.-v /my_output_dir:/out
: Mounts the host directory/my_output_dir
to the container directory/out
. All files created by the container (checkpoints, logs, etc.) will be saved in this directory.-v /my_data_dir:/data
: Mounts the host directory/my_data_dir
to the container directory/data
. This directory must contain your training data. See the Data docs for more information on how to structure your data.
Once the container is running, you can run LightlyTrain commands inside the container
as if you had installed it locally. The only difference is that paths must be specified
relative to the mounted directories /out
and /data
. For example, to train a model,
run the following command inside the container:
lightly-train train out="/out/my_experiment" data="/data" model="torchvision/resnet50"