Hardware Recommendations

Hardware Recommendations

The Lightly Worker should run on dedicated hardware to guarantee quick and stable data processing. Most cloud providers offer powerful instances that can be switched on and off depending on the workload. The table below shows hardware requirements and example instances of prominent cloud providers.

Input ImagesSystem MemoryvCPUsGPUEC2 InstanceGCP InstanceAzure Instance
< 1'000'00032GB8T4g4dn.2xlargen1-standard-8Standard_NC8as_T4_v3
< 10'000'00064GB16T4g4dn.4xlargen1-standard-16Standard_NC16as_T4_v3
> 10'000'000128GB32T4g4dn.8xlargen1-standard-32Standard_NC64as_T4_v3

For training self-supervised models or improved inference speed we recommend a V100, a A10 GPU, or better.

πŸ“˜

Cloud Resource Quota

Requesting GPU resources in the cloud on AWS, GCP or Azure the process can take up to 72 hours. We recommend increasing quota early even if you the resource will only be used later.

Operating System

For cloud instances we recommend a deep learning image with GPU support optimized for PyTorch. For example, on AWS, we recommend Deep Learning AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20230104(or similar). These images are regularly updated.

Cost Estimates

Streaming data from cloud storage (AWS S3, Google Cloud Storage, Azure) using the datasource feature can lead to high egress costs and slow down data loading. To prevent this, we highly recommend the cloud storage and compute instance be in the same region. Most cloud providers offer a price calculator to estimate costs based on instance uptime and expected egress.

Runtime Estimates

The expected runtime of Lightly Worker depends on the exact configuration. The table below shows the times measured on a g4dn.2xlarge instance on AWS EC2 for a few example datasets and configurations.

DatasetCrop ObjectsInputInput ImagesSelected ImagesEC2 InstanceExpected Runtime
OpenimagesNoImages100'00010'000g4dn.2xlarge< 60min
Berkley DeepDriveNoVideos150'16014'966g4dn.2xlarge< 60min
Comma10kYesImages10'0001977g4dn.2xlarge< 20min

GPU

Supported GPUs

Lightly Worker supports all Nvidia GPUs with CUDA support.

CPU only

Although not recommended, it's possible to run Lightly Worker without GPU. This may be especially useful to do dry runs and iterate quickly, for example when configuring the Lightly Worker for the first time. Processing will be significantly slower, especially for large datasets. To start the worker on a CPU-only machine, remove the --gpus allconfig option from the docker run command.

vCPUs

It's recommended to have at least eight vCPUs to make use of multiprocessing and multithreading. By default, Lightly Worker selects a reasonable number of process and threads based on the number of available vCPUs. The number of processes and threads can be configured with the following arguments:

  • num_processes (defaults to -1)
  • num_threads (defaults to-1)

πŸ“˜

Maximum Number of Processes/Threads

When Lightly Worker selects the number of processes or threads based on the number of vCPUs it will never go above 32 processes or 64 threads. To circumvent this use the configuration options num_processes and num_threads to set the number of processes and threads explicitly.

Running the Lightly Worker on Local Machines

Although not recommended, it is also possible to run the Lightly Worker on a local (non-cloud) machine. The recommended specs are similar. Make sure you have at least 4 cores and 16GB of system memory. A consumer GPU of one of the newer generations from Nvidia (1000er series or newer) should be sufficient.

We recommend to use Linux, but we also had customers successfully using the Lightly Worker on a Windows machine using WSL2. You might need to modify our Docker image to make things work for the time being.

Find the Compute Speed Bottleneck

The performance and speed of the Lightly Worker could be limited by one of three potential bottlenecks. Different steps of the Lightly Worker use these resources to different extents. Thus the bottleneck changes throughout the run.

The potential bottlenecks can be:

  • GPU
  • I/O (How fast can data be loaded from the cloud bucket?)
  • CPU

The GPU is used during three steps:

  • training an embedding model (optional step)
  • pretagging your data (optional step)
  • embedding your data

The I/O and CPUs are used during the previous 3 steps and also used during the following steps, which may take longer:

  • initializing the dataset
  • corruptness check
  • dataset sync with the Lightly Platform

Before changing the hardware configuration of your compute instance, we recommend first determining the bottleneck by monitoring it:

  • You can see the ethernet usage using the terminal command ifstat.
  • You can find out your machine's current CPU and RAM usage using the terminal commands top or htop.
  • You can find out the current GPU usage (both compute and VRAM) using the terminal command watch nvidia-smi.
  • Note that you might need to install these commands using your package manager.

Additional to using these tools, you can also compare the relative duration of the different steps to see the bottleneck. E.g., if the embedding step takes much longer than the corruptness check, then the GPU is the bottleneck. Otherwise, it is the I/O or CPU.