Hardware Recommendations

The Lightly Worker is usually run on dedicated hardware or in the cloud on a compute instance which is specifically spun up to run Lightly Worker standalone. Our recommendations on the hardware requirements of this compute instance are based on three criteria:

  • speed: The Lightly Worker should process your raw data as quickly as possible.
  • cost-effectiveness: The compute instance should be economical.
  • stability: The Lightly Worker should not crash because it runs out of memory.

Depending on your raw data size, we recommend the following machine:


Supported GPUs

At the moment, only Nvidia GPUs with CUDA support and capable of running PyTorch models are supported by the Lightly Worker. Some of the older GPUs we have been successfully using are Nvidia GTX 1080ti and Nvidia P100.

You can compute the number of frames of your videos with their length and fps. E.g. 100 videos with 600 seconds in length each and 30 fps have 100 * 600 * 30 = 1.8 million frames.

If you want to train an embedding model for many epochs or want to further increase computing speed, we recommend switching to a V100 or A10 GPU or better.

If you stream the data from cloud storage (AWS S3, Google Cloud Storage, Azure) using the datasource feature, ensure that the region of your cloud bucket and compute instance are the same. Using the same region is very important. It improves performance and reduces costs (see section Accessing services within the same AWS Region)

Keep the configuration option lightly.loader.num_workers at the default (-1), which will set it to the number of vCPUs on your compute instance.

Find the Compute Speed Bottleneck

The performance and speed of the Lightly Worker could be limited by one of three potential bottlenecks. Different steps of the Lightly Worker use these resources to different extents. Thus the bottleneck changes throughout the run.

The potential bottlenecks can be:

  • GPU
  • I/O (How fast can data be loaded from the cloud bucket?)
  • CPU

The GPU is used during three steps:

  • training an embedding model (optional step)
  • pretagging your data (optional step)
  • embedding your data

The I/O and CPUs are used during the previous 3 steps and also used during the following steps, which may take longer:

  • initializing the dataset
  • corruptness check
  • dataset sync with the Lightly Platform

Before changing the hardware configuration of your compute instance, we recommend first determining the bottleneck by monitoring it:

  • You can see the ethernet usage using the terminal command ifstat.
  • You can find out your machine's current CPU and RAM usage using the terminal commands top or htop.
  • You can find out the current GPU usage (both compute and VRAM) using the terminal command watch nvidia-smi.
  • Note that you might need to install these commands using your package manager.

Additional to using these tools, you can also compare the relative duration of the different steps to see the bottleneck. E.g., if the embedding step takes much longer than the corruptness check, then the GPU is the bottleneck. Otherwise, it is the I/O or CPU.

Upgrade your Compute Instance

When upgrading your compute instance to a different instance type, we recommend updating the resource that causes the bottleneck. After that, the bottleneck might have changed.

If there is not one obvious bottleneck, we recommend scaling up I/O, CPUs, and GPUs together.

To prevent the Lightly Worker from running out of system or GPU memory, we recommend 4GB of RAM and 2GB of VRAM for each vCPU.