(multi-gpu)= # Multi-GPU Set the `accelerator` and `devices` arguments to train on a single machine (node) with multiple GPUs. By default, **Lightly Train** uses all available GPUs on the current node for training. The following example shows how to train with two GPUs: ````{tab} Python ```{important} Always run your code inside an `if __name__ == "__main__":` block when using multiple GPUs! ``` ```python import lightly_train if __name__ == "__main__": lightly_train.train( out="out/my_experiment", data="my_data_dir", model="torchvision/resnet50", accelerator="gpu", # Accelerator type devices=2, # Number of GPUs ) ``` ```` ````{tab} Command Line ```bash lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" accelerator="gpu" devices=2 ``` ```` ```{tip} Set `devices=[1, 3]` to train on GPUs 1 and 3 specifically. When using the command line interface, set `devices="[1,3]"` instead. ``` (multi-gpu-adjusting-parameters)= ## Adjusting Parameters Parameters such as the batch size (`batch_size`), learning rate (`optim_args.lr`), and the number of dataloader workers (`num_workers`) are automatically adjusted based on the number of GPUs. You do not need to modify these parameters manually when changing the number of GPUs. ```{tip} The batch size (`batch_size`) is the global batch size across all GPUs. Setting `batch_size=256` with `devices=2` will result in a batch size of 128 per GPU. ``` ```{tip} The number of workers (`num_workers`) is the number of dataloader workers per GPU. Setting `num_workers=8` with `devices=2` results in 16 dataloader workers in total. The total number of dataloader workers should not exceed the number of CPU cores on the node to avoid training slowdowns. ``` ## SLURM Use the following setup to train on a SLURM-managed cluster with multiple GPUs: ````{tab} Python Create a SLURM script (`my_train_slurm.sh`) that looks as follows: ```bash #!/bin/bash -l #SBATCH --nodes=1 # Number of nodes #SBATCH --gres=gpu:2 # Number of GPUs #SBATCH --ntasks-per-node=2 # Must match the number of GPUs #SBATCH --cpus-per-task=12 # Number of CPU cores per GPU; must be larger than the # number of dataloader workers (num_workers) if num_workers # is set in the training function. Otherwise, num_workers is # automatically set to cpus-per-task - 1. #SBATCH --mem=0 # Use all available memory # IMPORTANT: Do not set --ntasks as it is automatically inferred from --ntasks-per-node. # Activate your virtual environment. # The command might differ depending on your setup. # For conda environments, use `conda activate my_env`. source .venv/bin/activate # On your cluster you might need to set the network interface: # export NCCL_SOCKET_IFNAME=^docker0,lo # Might need to load the latest CUDA version: # module load NCCL/2.4.7-1-cuda.10.0 # Run the training script. srun python my_train_script.py ``` Then create a Python script (`my_train_script.py`) that calls `lightly_train.train()`: ```python # my_train_script.py import lightly_train if __name__ == "__main__": lightly_train.train( out="out/my_experiment", data="my_data_dir", model="torchvision/resnet50", # The following arguments are automatically set based on the SLURM configuration: # accelerator="gpu", # devices=2, # num_workers=11, ) ``` Finally, submit the training job to the SLURM cluster with: ```bash sbatch my_train_slurm.sh ``` ```` ````{tab} Command Line Create a SLURM script (`my_train_slurm.sh`) that looks as follows: ```bash #!/bin/bash -l #SBATCH --nodes=1 # Number of nodes #SBATCH --gres=gpu:2 # Number of GPUs #SBATCH --ntasks-per-node=2 # Must match the number of GPUs #SBATCH --cpus-per-task=12 # Number of CPU cores per GPU; must be larger than the # number of dataloader workers (num_workers) if num_workers # is set in the training function. Otherwise, num_workers is # automatically set to cpus-per-task - 1. #SBATCH --mem=0 # Use all available memory # IMPORTANT: Do not set --ntasks as it is automatically inferred from --ntasks-per-node. # Activate your virtual environment. # The command might differ depending on your setup. # For conda environments, use `conda activate my_env`. source .venv/bin/activate # On your cluster you might need to set the network interface: # export NCCL_SOCKET_IFNAME=^docker0,lo # Might need to load the latest CUDA version: # module load NCCL/2.4.7-1-cuda.10.0 # Start the training. srun lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet50" # The following arguments are automatically set based on the SLURM configuration: # accelerator="gpu" # devices=2 # num_workers=11 ``` Then submit the training job to the SLURM cluster with: ```bash sbatch my_train_slurm.sh ``` ````