Hardware Recommendations
Hardware Recommendations
The Lightly Worker should run on dedicated hardware to guarantee quick and stable data processing. Most cloud providers offer powerful instances that can be switched on and off depending on the workload. The table below shows hardware requirements and example instances of prominent cloud providers.
Input Images | System Memory | vCPUs | GPU | EC2 Instance | GCP Instance | Azure Instance |
---|---|---|---|---|---|---|
< 1'000'000 | 32GB | 8 | T4 | g4dn.2xlarge | n1-standard-8 | Standard_NC8as_T4_v3 |
< 10'000'000 | 64GB | 16 | T4 | g4dn.4xlarge | n1-standard-16 | Standard_NC16as_T4_v3 |
> 10'000'000 | 128GB | 32 | T4 | g4dn.8xlarge | n1-standard-32 | Standard_NC64as_T4_v3 |
For training self-supervised models or improved inference speed we recommend a V100, a A10 GPU, or better.
Cloud Resource Quota
Requesting GPU resources in the cloud on AWS, GCP or Azure the process can take up to 72 hours. We recommend increasing quota early even if you the resource will only be used later.
Operating System
For cloud instances we recommend a deep learning image with GPU support optimized for PyTorch. For example, on AWS, we recommend Deep Learning AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20230104
(or similar). These images are regularly updated.
Cost Estimates
Streaming data from cloud storage (AWS S3, Google Cloud Storage, Azure) using the datasource feature can lead to high egress costs and slow down data loading. To prevent this, we highly recommend the cloud storage and compute instance be in the same region. Most cloud providers offer a price calculator to estimate costs based on instance uptime and expected egress.
Runtime Estimates
The expected runtime of Lightly Worker depends on the exact configuration. The table below shows the times measured on a g4dn.2xlarge instance on AWS EC2 for a few example datasets and configurations.
Dataset | Crop Objects | Input | Input Images | Selected Images | EC2 Instance | Expected Runtime |
---|---|---|---|---|---|---|
Openimages | No | Images | 100'000 | 10'000 | g4dn.2xlarge | < 60min |
Berkley DeepDrive | No | Videos | 150'160 | 14'966 | g4dn.2xlarge | < 60min |
Comma10k | Yes | Images | 10'000 | 1977 | g4dn.2xlarge | < 20min |
GPU
Supported GPUs
Lightly Worker supports all Nvidia GPUs with CUDA support.
CPU only
Although not recommended, it's possible to run Lightly Worker without GPU. This may be especially useful to do dry runs and iterate quickly, for example when configuring the Lightly Worker for the first time. Processing will be significantly slower, especially for large datasets. To start the worker on a CPU-only machine, remove the --gpus all
config option from the docker run
command.
vCPUs
It's recommended to have at least eight vCPUs to make use of multiprocessing and multithreading. By default, Lightly Worker selects a reasonable number of process and threads based on the number of available vCPUs. The number of processes and threads can be configured with the following arguments:
num_processes
(defaults to-1
)num_threads
(defaults to-1
)
Maximum Number of Processes/Threads
When Lightly Worker selects the number of processes or threads based on the number of vCPUs it will never go above 32 processes or 64 threads. To circumvent this use the configuration options
num_processes
andnum_threads
to set the number of processes and threads explicitly.
Running the Lightly Worker on Local Machines
Although not recommended, it is also possible to run the Lightly Worker on a local (non-cloud) machine. The recommended specs are similar. Make sure you have at least 4 cores and 16GB of system memory. A consumer GPU of one of the newer generations from Nvidia (1000er series or newer) should be sufficient.
We recommend to use Linux, but we also had customers successfully using the Lightly Worker on a Windows machine using WSL2. You might need to modify our Docker image to make things work for the time being.
Find the Compute Speed Bottleneck
The performance and speed of the Lightly Worker could be limited by one of three potential bottlenecks. Different steps of the Lightly Worker use these resources to different extents. Thus the bottleneck changes throughout the run.
The potential bottlenecks can be:
- GPU
- I/O (How fast can data be loaded from the cloud bucket?)
- CPU
The GPU is used during three steps:
- training an embedding model (optional step)
- pretagging your data (optional step)
- embedding your data
The I/O and CPUs are used during the previous 3 steps and also used during the following steps, which may take longer:
- initializing the dataset
- corruptness check
- dataset sync with the Lightly Platform
Before changing the hardware configuration of your compute instance, we recommend first determining the bottleneck by monitoring it:
- You can see the ethernet usage using the terminal command
ifstat
. - You can find out your machine's current CPU and RAM usage using the terminal commands
top
orhtop
. - You can find out the current GPU usage (both compute and VRAM) using the terminal command watch
nvidia-smi
. - Note that you might need to install these commands using your package manager.
Additional to using these tools, you can also compare the relative duration of the different steps to see the bottleneck. E.g., if the embedding step takes much longer than the corruptness check, then the GPU is the bottleneck. Otherwise, it is the I/O or CPU.
Updated 4 months ago