Hardware Recommendations
Hardware Recommendations
The LightlyOne Worker should run on dedicated hardware to guarantee quick and stable data processing. Most cloud providers offer powerful instances that can be switched on and off depending on the workload. The table below shows hardware requirements and example instances of prominent cloud providers.
Input Images | System Memory | vCPUs | GPU | EC2 Instance | GCP Instance | Azure Instance |
---|---|---|---|---|---|---|
< 1'000'000 | 32GB | 8 | T4 | g4dn.2xlarge | n1-standard-8 | Standard_NC8as_T4_v3 |
< 10'000'000 | 64GB | 16 | T4 | g4dn.4xlarge | n1-standard-16 | Standard_NC16as_T4_v3 |
> 10'000'000 | 128GB | 32 | T4 | g4dn.8xlarge | n1-standard-32 | Standard_NC64as_T4_v3 |
For training self-supervised models or improved inference speed we recommend a V100, a A10 GPU, or better.
Cloud Resource Quota
Requesting GPU resources in the cloud on AWS, GCP or Azure the process can take up to 72 hours. We recommend increasing quota early even if you the resource will only be used later.
Operating System
For cloud instances we recommend a deep learning image with GPU support optimized for PyTorch. For example, on AWS, we recommend Deep Learning AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20230104
(or similar). These images are regularly updated.
Runtime Estimates
The expected runtime of LightlyOne Worker depends on the exact configuration. The table below shows the times measured on a g4dn.2xlarge instance on AWS EC2 for a few example datasets and configurations.
Dataset | Crop Objects | Input | Input Images | Selected Images | EC2 Instance | Expected Runtime |
---|---|---|---|---|---|---|
Openimages | No | Images | 100'000 | 10'000 | g4dn.2xlarge | < 60min |
Berkley DeepDrive | No | Videos | 150'160 | 14'966 | g4dn.2xlarge | < 60min |
Comma10k | Yes | Images | 10'000 | 1977 | g4dn.2xlarge | < 20min |
Cost Estimates
Most cloud providers offer a price calculator to estimate costs based on instance uptime and expected egress.
Instance cost estimate
A seen in the Runtime Estimates just above, the LightlyOne Worker needs less than 1h for processing 100'000 images from the Openimages dataset. As the AWS EC2 instance costs about $0.75/h as a spot instance, the costs are less than $0.75 for processing the whole dataset.
Egress costs
Streaming data from cloud storage (AWS S3, Google Cloud Storage, Azure) using the datasource feature can lead to high egress costs and slow down data loading. To prevent this, we highly recommend the cloud storage and compute instance be in the same region.
GPU
Supported GPUs
LightlyOne Worker supports all Nvidia GPUs with CUDA support.
CPU only
Although not recommended, it's possible to run LightlyOne Worker without GPU. This may be especially useful to do dry runs and iterate quickly, for example when configuring the LightlyOne Worker for the first time. Processing will be significantly slower, especially for large datasets. To start the worker on a CPU-only machine, remove the --gpus all
config option from the docker run
command.
vCPUs
It's recommended to have at least eight vCPUs to make use of multiprocessing and multithreading. By default, LightlyOne Worker selects a reasonable number of process and threads based on the number of available vCPUs. The number of processes and threads can be configured with the following arguments:
num_processes
(defaults to-1
)num_threads
(defaults to-1
)
Maximum Number of Processes/Threads
When LightlyOne Worker selects the number of processes or threads based on the number of vCPUs it will never go above 32 processes or 64 threads. To circumvent this use the configuration options
num_processes
andnum_threads
to set the number of processes and threads explicitly.
RAM
We recommend having 4GB of memory per vCPU. If you have too little memory, we recommend reducing the number of processes and threads such that there is 4GB of memory per process and 2GB per thread. E.g. if you have 32GB of RAM in total, set num_processes=8
and num_threads=16
when running the worker. See the section on vCPUs above for details.
Running the LightlyOne Worker on Local Machines
Although not recommended, it is also possible to run the LightlyOne Worker on a local (non-cloud) machine. The recommended specs are similar. Make sure you have at least 4 cores and 16GB of system memory. A consumer GPU of one of the newer generations from Nvidia (1000er series or newer) should be sufficient.
We recommend to use Linux, but we also had customers successfully using the LightlyOne Worker on a Windows machine using WSL2. You might need to modify our Docker image to make things work for the time being.
Find the Compute Speed Bottleneck
The performance and speed of the LightlyOne Worker could be limited by one of three potential bottlenecks. Different steps of the LightlyOne Worker use these resources to different extents. Thus the bottleneck changes throughout the run.
The potential bottlenecks can be:
- GPU
- I/O (How fast can data be loaded from the cloud bucket?)
- CPU
The GPU is used during three steps:
- training an embedding model (optional step)
- pretagging your data (optional step)
- embedding your data
The I/O and CPUs are used during the previous 3 steps and also used during the following steps, which may take longer:
- initializing the dataset
- corruptness check
- dataset sync with the LightlyOne Platform
Before changing the hardware configuration of your compute instance, we recommend first determining the bottleneck by monitoring it:
- You can see the ethernet usage using the terminal command
ifstat
. - You can find out your machine's current CPU and RAM usage using the terminal commands
top
orhtop
. - You can find out the current GPU usage (both compute and VRAM) using the terminal command watch
nvidia-smi
. - Note that you might need to install these commands using your package manager.
Additional to using these tools, you can also compare the relative duration of the different steps to see the bottleneck. E.g., if the embedding step takes much longer than the corruptness check, then the GPU is the bottleneck. Otherwise, it is the I/O or CPU.
Updated 4 months ago