The Lightly Worker is usually run on dedicated hardware or in the cloud on a compute instance which is specifically spun up to run Lightly Worker standalone. Our recommendations on the hardware requirements of this compute instance are based on three criteria:
- speed: The Lightly Worker should process your raw data as quickly as possible.
- cost-effectiveness: The compute instance should be economical.
- stability: The Lightly Worker should not crash because it runs out of memory.
Depending on your raw data size, we recommend the following machine:
- Up to 1 million images or video frames: Use the AWS EC2 instance g4dn.2xlarge or similar with 8 vCPUs, 32GB of system memory, one T4 GPU
- Up to 10 million images or video frames: Use the AWS EC2 instance g4dn.4xlarge or similar with 16 vCPUs, 64GB of system memory, one T4 GPU
- More than 10 million images or video frames: Use the AWS EC2 instance g4dn.8xlarge or similar with 32 vCPUs, 128GB of system memory, one T4 GPU
If you're creating a new cloud instance and can pick an operating system image we recommend using Deep Learning optimized images with GPU support and optimized for PyTorch. These images look like this on AWS and are regularly updated:
Deep Learning AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20230104
At the moment, only Nvidia GPUs with CUDA support and capable of running PyTorch models are supported by the Lightly Worker. Some of the older GPUs we have been successfully using are Nvidia GTX 1080ti and Nvidia P100.
You can compute the number of frames of your videos with their length and fps. E.g. 100 videos with 600 seconds in length each and 30 fps have 100 600 30 = 1.8 million frames.
If you want to train an embedding model for many epochs or want to further increase computing speed, we recommend switching to a V100 or A10 GPU or better.
If you stream the data from cloud storage (AWS S3, Google Cloud Storage, Azure) using the datasource feature, ensure that the region of your cloud bucket and compute instance are the same. Using the same region is very important. It improves performance and reduces costs (see section
Accessing services within the same AWS Region)
Keep the configuration option
lightly.loader.num_workers at the default (
-1), which will set it to the number of vCPUs on your compute instance.
AWS/ GCP/ Azure Quota
If you request GPU resources in the cloud on AWS, GCP or Azure the process might take up to 72 hours until you can use the machine. We recommend to request quota early on (even if you don't use the machine soon) just to get access.
Although not recommended, it is also possible to run the Lightly Worker on a local (non-cloud) machine. The recommended specs are similar. Make sure you have at least 4 cores and 16GB of system memory. A consumer GPU of one of the newer generations from Nvidia (1000er series or newer) should be sufficient.
We recommend to use Linux, but we also had customers successfully using the Lightly Worker on a Windows machine using WSL2. You might need to modify our Docker image to make things work for the time being.
The performance and speed of the Lightly Worker could be limited by one of three potential bottlenecks. Different steps of the Lightly Worker use these resources to different extents. Thus the bottleneck changes throughout the run.
The potential bottlenecks can be:
- I/O (How fast can data be loaded from the cloud bucket?)
The GPU is used during three steps:
- training an embedding model (optional step)
- pretagging your data (optional step)
- embedding your data
The I/O and CPUs are used during the previous 3 steps and also used during the following steps, which may take longer:
- initializing the dataset
- corruptness check
- dataset sync with the Lightly Platform
Before changing the hardware configuration of your compute instance, we recommend first determining the bottleneck by monitoring it:
- You can see the ethernet usage using the terminal command
- You can find out your machine's current CPU and RAM usage using the terminal commands
- You can find out the current GPU usage (both compute and VRAM) using the terminal command watch
- Note that you might need to install these commands using your package manager.
Additional to using these tools, you can also compare the relative duration of the different steps to see the bottleneck. E.g., if the embedding step takes much longer than the corruptness check, then the GPU is the bottleneck. Otherwise, it is the I/O or CPU.
When upgrading your compute instance to a different instance type, we recommend updating the resource that causes the bottleneck. After that, the bottleneck might have changed.
If there is not one obvious bottleneck, we recommend scaling up I/O, CPUs, and GPUs together.
To prevent the Lightly Worker from running out of system or GPU memory, we recommend 4GB of RAM and 2GB of VRAM for each vCPU.
Updated 9 months ago