LightlyEdge C++ SDK
Loading...
Searching...
No Matches
03 Similarity Search

We will build our first real application. We will pass to LightlyEdge the four images shown below. Using the similarity search feature, we will identify images that contain mountains.

Project Setup

Code for this tutorial is provided in examples/03_similarity_search directory. Before starting this tutorial, copy the model file to examples/lightly_model.tar and verify that your project layout is as follows:

lightly_edge_sdk_cpp
├── ...
└── examples
   ├── ...
   ├── 03_similarity_search
   │   ├── CMakeLists.txt
   │   ├── images
   │   │   ├── london1.jpg
   │   │   ├── london2.jpg
   │   │   ├── matterhorn1.jpg
   │   │   └── matterhorn2.jpg
   │   ├── main.cpp
   │   └── stb_image.h
   └── lightly_model.tar

Build and Run a Complete Example

See below the content of the main.cpp file. We will first run the example, and explain it right after.

1// main.cpp
2#include <iostream>
3#define STB_IMAGE_IMPLEMENTATION
4#include "stb_image.h"
5#include "lightly_edge_sdk.h"
6using namespace lightly_edge_sdk;
7
8// Loads an image using stb_image and returns it as a lightly_edge_sdk::Frame struct.
9Frame load_image(std::string image_path) {
10 std::cout << "Loading image: " << image_path << std::endl;
11 int width, height, channels;
12 unsigned char *data = stbi_load(image_path.c_str(), &width, &height, &channels, 0);
13 if (data == nullptr) {
14 throw std::runtime_error("Failed to load image.");
15 }
16 // Create a Frame struct.
17 return Frame(width, height, data);
18}
19
20int main() {
21 // Initialize the LightlyEdge SDK.
22 std::cout << "Initializing LightlyEdge..." << std::endl << std::endl;
23 lightly_edge_sdk::LightlyEdgeConfig config = lightly_edge_sdk::default_config();
24 LightlyEdge lightly_edge =
25 LightlyEdge::new_from_tar("../lightly_model.tar", config);
26
27 // Register a similarity strategy for the text "mountains".
28 float threshold = 0.5;
29 std::vector<std::vector<float>> text_embeddings = lightly_edge.embed_texts({"mountains"});
30 lightly_edge.register_similarity_strategy(text_embeddings[0], threshold);
31
32 // Iterate over the images.
33 std::vector<std::string> image_paths = {
34 "images/matterhorn1.jpg",
35 "images/matterhorn2.jpg",
36 "images/london1.jpg",
37 "images/london2.jpg",
38 };
39 for (const auto &path : image_paths) {
40 // Load the image.
41 Frame frame = load_image(path);
42
43 // Embed the image and check if it should be selected.
44 std::vector<float> image_embedding = lightly_edge.embed_frame(frame);
45 SelectInfo select_info = lightly_edge.should_select(image_embedding, {});
46
47 // Print whether the image is selected.
48 SimilaritySelectInfo similarity_select_info = select_info.similarity[0];
49 std::cout << " should_select: " << similarity_select_info.should_select << std::endl;
50 std::cout << " distance: " << similarity_select_info.distance << std::endl << std::endl;
51
52 // The image is no longer needed, free the memory.
53 stbi_image_free(frame.rgbImageData_);
54 }
55
56 std::cout << "Program successfully finished." << std::endl;
57 return 0;
58}
LightlyEdge.
Definition lightly_edge_sdk.h:118
auto embed_texts(const std::vector< std::string > &texts) const -> std::vector< std::vector< float > >
Embed a list of text strings.
Definition lightly_edge_sdk.h:246
auto register_similarity_strategy(const std::vector< float > &query_embedding, float max_distance) const -> void
Register a similarity strategy.
Definition lightly_edge_sdk.h:551
auto should_select(const std::vector< float > &embedding, const std::vector< ObjectDetection > &detections) const -> SelectInfo
Check if a frame should be selected.
Definition lightly_edge_sdk.h:985
auto embed_frame(const Frame &frame) const -> std::vector< float >
Embed an RGB image.
Definition lightly_edge_sdk.h:199
Namespace with core LightlyEdge SDK functionality.
Definition lightly_edge_error.h:15
Definition lightly_edge_rs_bindings.h:43
Holds information whether a frame should be selected or not.
Definition lightly_edge_rs_bindings.h:65
Frame data for LightlyEdge.
Definition lightly_edge_sdk.h:25
void * rgbImageData_
Pointer to the RGB image data.
Definition lightly_edge_sdk.h:42
Selection information about a processed frame.
Definition lightly_edge_sdk.h:83
std::vector< SimilaritySelectInfo > similarity
The similarity selection info for each similarity strategy in the order of registration.
Definition lightly_edge_sdk.h:94

Build and run:

# Enter the project folder.
cd 03_similarity_search
# Configure CMake. This will create a `build` subfolder.
cmake -B build
# Build using configuration from the `build` subfolder.
cmake --build build
# Run (Linux variant)
./build/main
# Or run (Windows variant)
.\build\[build_type]\main.exe
# where [build_type] is either Release or Debug.

The output should be similar to the following, the distances might slightly differ on your machine architecture:

Initializing LightlyEdge...
Loading image: images/matterhorn1.jpg
should_select: 1
distance: 0.461122
Loading image: images/matterhorn2.jpg
should_select: 1
distance: 0.464557
Loading image: images/london1.jpg
should_select: 0
distance: 0.533587
Loading image: images/london2.jpg
should_select: 0
distance: 0.532412
Program successfully finished.

That's what we wanted to see! The model can separate the images of Matterhorn and London, and chooses the first two to be selected.

Feel free to experiment with different text queries on the line with lightly_edge.embed_texts, try e.g. "city" or "bus".

Note
In this example we use the model version lightly_model_14.tar. You might need to adjust the thresholds in this tutorial if your model version differs.

Image Loading

Let's explain what is going on in the code. First we need to load the images. We chose to do so with the open-source header-only stb_image library.

LightlyEdge C++ SDK accepts images loaded in lightly_edge_sdk::Frame struct. It stores the image width, height and a void* pointer to the image data encoded in RGB with three bytes per pixel. Luckily, stbi_load loads images in this byte format. The load_image function wraps the loading logic:

// main.cpp
#include <iostream>
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#include "lightly_edge_sdk.h"
using namespace lightly_edge_sdk;
// Loads an image using stb_image and returns it as a lightly_edge_sdk::Frame struct.
Frame load_image(std::string image_path) {
std::cout << "Loading image: " << image_path << std::endl;
int width, height, channels;
unsigned char *data = stbi_load(image_path.c_str(), &width, &height, &channels, 0);
if (data == nullptr) {
throw std::runtime_error("Failed to load image.");
}
// Create a Frame struct.
return Frame(width, height, data);
}

In the main function, images are processed in a for loop. Importantly, we free the image memory after it is no longer needed with stbi_image_free.

int main() {
// PLACEHOLDER: Initialization logic.
// Iterate over the images.
std::vector<std::string> image_paths = {
"images/matterhorn1.jpg",
"images/matterhorn2.jpg",
"images/london1.jpg",
"images/london2.jpg",
};
for (const auto &path : image_paths) {
// Load the image.
Frame frame = load_image(path);
// PLACEHOLDER: Processing logic.
// The image is no longer needed, free the memory.
stbi_image_free(frame.rgbImageData_);
}
std::cout << "Program successfully finished." << std::endl;
return 0;
}

Similarity Search

There are several steps needed to set up similarity search. Note that in a real application the exceptions should be handled in a try-catch block. For clarity, we annotate the return types.

This is the initialization code at the beginning of the main function:

// Initialize the LightlyEdge SDK.
std::cout << "Initializing LightlyEdge..." << std::endl << std::endl;
LightlyEdge lightly_edge = LightlyEdge::new_from_tar("lightly_model.tar");
// Register a similarity strategy for the text "mountains".
float threshold = 0.5;
std::vector<std::vector<float>> text_embeddings = lightly_edge.embed_texts({"mountains"});
lightly_edge.register_similarity_strategy(text_embeddings[0], threshold);

LightlyEdge is first initialized from a TAR archive. Then we get an embedding for our text query "mountains" by calling lightly_edge_sdk::LightlyEdge::embed_texts. The function accepts a list of strings and returns a list of embeddings. We are interested in the only embedding returned at index 0.

Multiple selection strategies can be registered on lightly_edge_sdk::LightlyEdge. Each strategy independently decides whether a frame should be selected.

We call lightly_edge_sdk::LightlyEdge::register_similarity_strategy to register a single similarity strategy. It has two arguments: query_embedding and max_distance. A frame is selected if it is closer than max_distance to the query in the embedding space. The distances are based on cosine similarity and range from 0 (closest) to 1 (furthest).

Next, images are processed in a for loop in the main function:

for (const auto &path : image_paths) {
// Load the image.
Frame frame = load_image(path);
// Embed the image and check if it should be selected.
std::vector<float> embedding = lightly_edge.embed_frame(frame);
SelectInfo select_info = lightly_edge.should_select(embedding);
// Print whether the image is selected.
SimilaritySelectInfo similarity_select_info = select_info.similarity[0];
std::cout << " should_select: " << similarity_select_info.should_select << std::endl;
std::cout << " distance: " << similarity_select_info.distance << std::endl << std::endl;
// The image is no longer needed, free the memory.
stbi_image_free(frame.rgbImageData_);
}

The code embeds the image and calls lightly_edge_sdk::LightlyEdge::should_select which returns lightly_edge_sdk::SelectInfo. The structure contains the decision of each registered strategy whether a frame should be selected. We print the decision result and the distance to the query.

How To Choose Similarity Parameters

For best results, the query texts and max distance threshold should be carefully selected based on your data.

The max_distance parameter controls the tradeoff between precision and recall. It ranges from 0 to 1. A higher value selects more images, but might include images that are not relevant. A lower value selects fewer images which match the query most closely, but might miss some relevant images.

A suitable value for max_distance can be found by collecting a small set of positive and negative examples, logging the distances to the query, and choosing a value that separates the two sets well.

A good starting point is to set max_distance to 0.475 and adjust it in 0.005 increments. Suitable threshold values usually lie in the range 0.45 to 0.55.

The text queries should be chosen based on the use case. The queries can be specified in a natural language. The exact formulation of the query influences the results. We recommend trying different queries and thresholds to find the best match for the use case. The model tends to be more accurate for queries that are unambiguous and refer to simpler rather than complex concepts.

Image Search

The example above showcases searching images with a text. Using the same interface, it is possible to search for images similar to a known image. The only difference is registering a similarity strategy with an image embedding instead of a text embedding:

Frame frame = load_image(path);
std::vector<float> embedding = lightly_edge.embed_frame(frame);
float threshold = 0.35;
lightly_edge.register_similarity_strategy(embedding, threshold);
stbi_image_free(frame.rgbImageData_);

Next Steps

Next we will set up LightlyEdge to perform Diversity Selection.