Active Learning Using YOLOv7 for Waste Sorting
Use this complete guide to use LightlyOne and YOLOv7 for creating a complete active learning loop on an object detection task to solve a waste sorting problem.
This tutorial will teach you how to create a complete active learning pipeline. You will work on the ZeroWaste dataset, a collection of images of various types of garbage. By creating a balanced dataset, you can improve the performance of your object detection algorithm in this application.
You will learn how to:
- Extract a random 10% of the dataset to be labeled to do the first supervised fine-tuning.
- Fine-tune a YOLOv7 model on the first split.
- Use the trained model to extract predictions on the rest of the dataset.
- Use prediction confidence to make an active learning iteration with Lightly, selecting 10% of images.
- Train the model again and compare the results to random.
This tutorial can be intended as a follow-up of Active Learning Using YOLOv7 and Comma10k.
Prerequisites
To upload predictions to a Lightly datasource, you will need the following things:
- Have LightlyOne installed and setup.
- Access to a cloud bucket to which you can upload your dataset. The following tutorial will use an AWS S3 bucket.
- To use the YOLOv7 model, you can look at the official GitHub repository.
- The ZeroWaste dataset. The dataset consists of
3003
training images for waste sorting and is available here. You can download the dataset directly from the official website. - It is recommended using
Python 3.7
or newer.
Downloading ZeroWaste Dataset
You can run the following command in your terminal to download and extract the ZeroWaste dataset:
wget https://zenodo.org/record/6412647/files/zerowaste-f-final.zip?download=1
unzip zerowaste-f-final.zip
Download YOLOv7 using the following command:
git clone https://github.com/WongKinYiu/yolov7
If you don't have a Python environment with PyTorch and some other dependencies installed, runpip install -r yolov7/requirements.txt
and pip install lightly
to install all the required dependencies.
You also need a model checkpoint. You can get the one from the official repository using the following command:
cd yolov7 && wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
Now you should have a folder structure similar to this one:
yolov7/
└── ...
zerowaste/
└── ...
Prepare and label the first split
You can find in the downloaded dataset both images and labels. For the scope of this tutorial, you will pretend you do not have the labels and that you want to label just a total of 20% of the dataset. You want to select the first 10% of images by choosing a random subset. To do so, you will use Lightly's random selection feature.
Prepare Dataset
To work with Lightly, you must upload the data to a cloud provider of your choice. This tutorial uses AWS S3. LightlyOne also supports Google Cloud Storage or Azure.
Set Up Your S3 Bucket
If you haven't done it already, follow the instructions here to set up an input and Lightly datasource on S3. Then upload your dataset to AWS. It is recommended to use AWS CLI for faster upload.
Optional: Upload Data to Bucket using AWS CLI
Install AWS CLI
It is suggested using AWS CLI to upload your dataset and predictions because it is faster in uploading large numbers of images. You can find the tutorial on installing the CLI on your system here. Test if AWS CLI was installed successfully with:
which aws
aws --version
After successful installation, you also need to configure the AWS CLI and enter your IAM credentials:
aws configure
Upload Dataset
Now you can copy the content of your dataset to your cloud bucket with the aws s3 cp
command.
aws s3 cp zerowaste s3://yourInputBucket/zerowaste_input/ --recursive
Replace the Placeholder
Make sure you don't forget to replace the placeholder
yourInputBucket
with the name of your AWS S3 bucket.
The goal of this first part of the tutorial is to get a 10% of images randomly selected from the train set. Validation and test sets will be used to evaluate the algorithm's performance. Lightly's relevant filenames feature makes the selection on the train set very straightforward. To use it, you will need to set up a LightlyOne bucket containing the relevant_filenames_split_0.txt
file:
train/data/*
Upload it to the .lightly
folder in your s3 bucket with the following:
aws s3 cp relevant_filenames_split_0.txt s3://yourLightlyBucket/zerowaste_lightly/.lightly/
Now your datasource should contain:
s3://yourInputBucket/zerowaste_input/
├── train/
| ├── data/
| ├── labels/
| └── labels.json
├── val/
| ├── data/
| ├── labels/
| └── labels.json
└── test/
├── data/
├── labels/
└── labels.json
s3://yourLightlyBucket/zerowaste_lightly/
└── .lightly/
└── relevant_filenames_split_0.txt
You successfully set up your input bucket. Later in this guide, you will also upload the predictions used for active learning in the Lightly bucket.
Start the LightlyOne Worker
Now, you can start the LightlyOne Worker. The worker will wait for new jobs to be processed. The cool thing about this setup is that you can start the LightlyOne Worker on any machine. You could, for example, use your cloud instance with a GPU or a local server to run the LightlyOne Worker while using your notebook to schedule the run.
docker run --shm-size="1024m" --gpus all --rm -it \
-e LIGHTLY_TOKEN={MY_LIGHTLY_TOKEN} \
lightly/worker:latest \
Use your LightlyOne Token and Worker Id
Don't forget to replace the
{MY_LIGHTLY_TOKEN}
placeholder with your own token. In case you forgot your token, you can find your token in the preferences menu of the LightlyOne Platform.
Create the Dataset and Datasource
The next four code snippets can be merged into a single Python file like in the example you can find here.
You import the dependencies in this first part of the script and create a new dataset.
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose
# Create the LightlyOne client to connect to the API.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN")
# Create a new dataset on the LightlyOne Platform.
client.create_dataset(dataset_name="comma10k", dataset_type=DatasetType.IMAGES)
dataset_id = client.dataset_id
In the second part, you link your dataset with your datasource.
# Configure the Input datasource.
client.set_s3_delegated_access_config(
resource_path="s3://yourInputBucket/zerowaste_input/",
region="eu-central-1",
role_arn="S3-ROLE-ARN",
external_id="S3-EXTERNAL-ID",
purpose=DatasourcePurpose.INPUT,
)
# Configure the LightlyOne datasource.
client.set_s3_delegated_access_config(
resource_path="s3://yourInputBucket/zerowaste_lightly/",
region="eu-central-1",
role_arn="S3-ROLE-ARN",
external_id="S3-EXTERNAL-ID",
purpose=DatasourcePurpose.INPUT,
)
Schedule a Run
Finally, you can schedule a LightlyOne Worker run to select 10% of the images in a randomized way. If you want to reproduce exactly our split, enter the random_seed
variable.
client.schedule_compute_worker_run(
worker_config={
"relevant_filenames_file" : ".lightly/relevant_filenames_split_0.txt",
},
selection_config={
"proportion_samples": 0.1,
"strategies": [
{
"input": {
"type": "RANDOM",
"random_seed": 42, # optional, for reproducibility
},
"strategy": {
"type": "WEIGHTS",
}
}
]
}
)
Download split results
There are two ways of exporting data from Lightly. Either through the UI or (recommended) using the API.
Using the Lightly Python Client, this would work like this:
from lightly.api import ApiWorkflowClient
# Create the LightlyOne client to connect to the API.
# You can also combine this with the script above and reuse the client.
client = ApiWorkflowClient(token="MY_LIGHTLY_TOKEN", dataset_id="MY_DATASET_ID")
filenames = client.export_filenames_by_tag_name(
tag_name="initial-tag" # name of the tag in the dataset
)
with open("filenames-of-initial-tag.txt", "w") as f:
f.write(filenames)
In the UI, you can export the filenames using the Download & Export menu item.
Finetune on the first split
To fine-tune our YOLO model on this new split, first, you must copy the images in the tag to a new folder and label them.
You can copy the images running this script:
from pathlib import Path
import shutil
split_Path = Path("yourPathTo/filenames-of-initial-tag.txt")
data_Path = Path("zerowaste/train/data/")
output_Path = Path("zerowaste/split_0/")
with split_Path.open("r") as m:
filenames = m.read().split("\n")
output_Path.mkdir(exist_ok=True)
for filename in filenames:
shutil.copy(data_Path/Path(filename).name, output_Path/Path(filename).name)
ZeroWaste already provides labels in COCO format, so you'll need to translate them into the YOLOv7 format.
This is a helper script to do so:
from pathlib import Path
import json
from typing import List
annotation_Path = Path("zerowaste/train/labels.json")
output_Path = Path("zerowaste/split_0")
data_Path = Path("zerowaste/split_0")
split_Flag = False
split_Path = Path("")
def ann_extractor(annotation_path: Path):
if annotation_path.exists():
with annotation_path.open("r") as m:
ann_dict = json.load(m)
else:
ann_dict = {}
return ann_dict
def data_loader(data_path: Path) -> List[str]:
full_names: List[Path] = list(data_path.glob("*.PNG"))
filenames: List[str] = [
str(full_name.name) for full_name in full_names
]
return filenames
def get_filenames_from_split(split_path: Path) -> List[str]:
if split_path.exists():
with split_path.open("r") as m:
ann_dict = m.readlines()
else:
ann_dict = {}
return ann_dict
def get_image_id_dict(ann_dict_):
image_id_to_filename = {}
for image in ann_dict_["images"]:
image_id_to_filename[image["id"]] = image["file_name"]
return image_id_to_filename
def get_image_shapes(ann_dict_):
heights = {}
widths = {}
for image in ann_dict_["images"]:
heights[image["id"]] = image["height"]
widths[image["id"]] = image["width"]
return heights, widths
def get_yolo_bb_from_coco(x: int, y: int, width: int, height: int, image_height: int, image_width: int):
# Finding midpoints
x_centre = (x + (x+width))/2
y_centre = (y + (y+height))/2
# Normalization
x_centre = x_centre / image_width
y_centre = y_centre / image_height
width = width / image_width
height = height / image_height
# Limiting upto fix number of decimal places
x_centre = format(x_centre, '.6f')
y_centre = format(y_centre, '.6f')
width = format(width, '.6f')
height = format(height, '.6f')
return x_centre, y_centre, width, height
annotation_dict = ann_extractor(annotation_Path)
image_ids = get_image_id_dict(annotation_dict)
if split_Flag:
filenames = get_filenames_from_split(split_Path)
else:
filenames = data_loader(data_Path)
image_heights, image_widths = get_image_shapes(annotation_dict)
output_Path.mkdir(exist_ok=True)
for filename in filenames:
with (output_Path/Path(filename).with_suffix(".txt")).open("w") as p:
p.write("")
for annotation in annotation_dict["annotations"]:
id = annotation["image_id"]
filename = image_ids[id]
if filename in filenames:
x,y,w,h = annotation["bbox"]
x_centre, y_centre, width, height = get_yolo_bb_from_coco(x,y,w,h, image_heights[id], image_widths[id])
yolo_ann = f"{annotation['category_id']-1} {x_centre} {y_centre} {width} {height}\n"
with (output_Path/Path(filename).with_suffix(".txt")).open("a") as p:
p.write(yolo_ann)
YOLO Labels
Yolo labels are in the format:
[class_index, x_min, y_min, x_max, y_max]
With each prediction for each image saved in a .txt file having the same name as the image in the same folder. The bounding box coordinates are pixel coordinates and are scaled based on the input image size, so it is a number from 0 to 1. x_min and i_max refer to the starting and ending x coordinates of the bounding box. The same applies to y coordinates.
Last thing you have to do is to run this script also for val
and test
folders. Without those labels, you will not be able to do the evaluation. Please remember to put both labels and images in the same folder!
Finetune YOLO
Now that you have in the split_0
folder both images and annotations for each image, you can fine-tune your model. To do so, you'll have to create new configuration files for data
and cfg
.
You should store the content of this block into theyolov7/cfg/training/yolov7_zerowaste.yaml
:
# parameters
nc: 4 # number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
# anchors
anchors:
- [12,16, 19,36, 40,28] # P3/8
- [36,75, 76,55, 72,146] # P4/16
- [142,110, 192,243, 459,401] # P5/32
# yolov7 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 11
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 16-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 24
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 29-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 37
[-1, 1, MP, []],
[-1, 1, Conv, [512, 1, 1]],
[-3, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 42-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 50
]
# yolov7 head
head:
[[-1, 1, SPPCSPC, [512]], # 51
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[37, 1, Conv, [256, 1, 1]], # route backbone P4
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 63
[-1, 1, Conv, [128, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[24, 1, Conv, [128, 1, 1]], # route backbone P3
[[-1, -2], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [128, 1, 1]], # 75
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3, 63], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 88
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3, 51], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]],
[-2, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 101
[75, 1, RepConv, [256, 3, 1]],
[88, 1, RepConv, [512, 3, 1]],
[101, 1, RepConv, [1024, 3, 1]],
[[102,103,104], 1, IDetect, [nc, anchors]], # Detect(P3, P4, P5)
]
And this into yolov7/data/zerowaste_random_0.yaml
:
train: zerowaste/split_0
val: zerowaste/val/data
test: zerowaste/test/data
# number of classes
nc: 4
# class names
names: ['rigid_plastic', 'cardboard', 'metal', 'soft_plastic']
You can finally train the model by running:
python train.py --epochs 100 --workers 4 --device 0 --batch-size 16 --data data/zerowaste_random_0.yaml \
--img 600 600 --cfg cfg/training/yolov7_zerowaste.yaml --weights 'yolov7.pt' \
--name yolov7_zerowaste_split_0 --hyp data/hyp.scratch.custom.yaml
Congratulations, you did your first training iteration.
Prepare your Active Learning iteration
Now that you have a trained model, you can use it to compute object detection predictions. Those will be used for balancing your next dataset split. The predictions should be stored in Lightly format and then uploaded to the Lightly bucket. To do so, run this script to run inference on your dataset:
from pathlib import Path
import json
import torch
import torchvision
import cv2
from tqdm import tqdm
from models.experimental import attempt_load
from utils.torch_utils import select_device
from utils.general import check_img_size, non_max_suppression, scale_coords
from utils.datasets import LoadImages
predictions_rooth_path = Path("predictions")
task_name = "object_detection_zero_waste"
predictions_path = Path(predictions_rooth_path / task_name)
important_classes = {
"rigid_plastic": 0,
"cardboard": 1,
"metal": 2,
"soft_plastic": 3,
}
classes = list(important_classes.values())
# create tasks.json
tasks_json_path = predictions_rooth_path / "tasks.json"
tasks_json_path.parent.mkdir(parents=True, exist_ok=True)
with open(tasks_json_path, "w") as f:
json.dump([task_name], f)
# create schema.json
schema = {"task_type": "object-detection", "categories": []}
for key, val in important_classes.items():
cat = {"id": val, "name": key}
schema["categories"].append(cat)
schema_path = predictions_path / "schema.json"
schema_path.parent.mkdir(parents=True, exist_ok=True)
with open(schema_path, "w") as f:
json.dump(schema, f, indent=4)
device = select_device()
model = attempt_load("yolov7/runs/train/yolov7_zerowaste_split_0/weights/best.pt", map_location=device) # load FP32 model
stride = int(model.stride.max()) # model stride
imgsz = check_img_size(600, s=stride) # check img_size
dataset = LoadImages(
"zerowaste-f/train/data/", # here we use the folder instead of a single image
img_size=640,
stride=32,
)
for path, img, im0s, vid_cap in tqdm(dataset):
img = torch.from_numpy(img).to(device).float().unsqueeze(0)
img /= 255.0
with torch.no_grad():
prediction = model(img)[0]
# apply NMS and only keep classes we care about (see data/coco.yaml)
predictions = non_max_suppression(
prediction, conf_thres=0.25, iou_thres=0.45, classes=classes
)[0]
fname = Path(path).relative_to(Path("zerowaste/"))
lightly_prediction = {
"file_name": str(fname),
"predictions": [],
}
# we need to rescale the bounding boxes as inference was done
# on resized and padded images
predictions[:, :4] = scale_coords(
img.shape[2:], predictions[:, :4], im0s.shape
).round()
for prediction in predictions:
x0, y0, x1, y1, conf, class_id = prediction.cpu().numpy()
# note that we need to conver form x0, y0, x1, y1 to x, y, w, h format
pred = {
"category_id": int(class_id),
"bbox": [int(x0), int(y0), int(x1 - x0), int(y1 - y0)],
"score": float(conf),
}
lightly_prediction["predictions"].append(pred)
# create the prediction file for the image
path_to_prediction = predictions_path / fname.with_suffix(".json")
path_to_prediction.parents[0].mkdir(parents=True, exist_ok=True)
with open(path_to_prediction, "w") as f:
json.dump(lightly_prediction, f, indent=4)
And then upload the predictions folder to the Lightly bucket:
aws s3 cp predictions/ s3://yourInputBucket/zerowaste_lightly/.lightly/predictions --recursive
Schedule the active learning run
You can now schedule a LightlyOne Worker run to select a subset of the data based on your criteria. To squeeze out the best performance, you should:
- Use predictions to balance the ratio of the classes.
- Use predictions to create embeddings of the objects within the bounding boxes to find diverse objects.
- Use predictions to focus on objects where the probability of the prediction is low (Active Learning).
- Train an embedding model for 25 epochs to improve the quality of the embeddings.
scheduled_run_id = client.schedule_compute_worker_run(
worker_config={
"enable_training": True,
"use_datapool": True,
"relevant_filenames_file" : ".lightly/relevant_filenames_split_1.txt"
},
selection_config={
"n_samples": 300,
"strategies": [
{
# strategy to find diverse objects
"input": {
"type": "EMBEDDINGS",
"task": "object_detection_zero_waste",
},
"strategy": {
"type": "DIVERSITY",
},
},
{
# strategy to balance the class ratios
"input": {
"type": "PREDICTIONS",
"name": "CLASS_DISTRIBUTION",
"task": "object_detection_zero_waste",
},
"strategy": {
"type": "BALANCE",
"target": {
"rigid_plastic": 0.25,
"cardboard": 0.25,
"metal": 0.25,
"soft_plastic": 0.25,
},
},
},
{
# strategy to prioritize images with more objects
"input": {
"type": "PREDICTIONS",
"task": "object_detection_zero_waste",
"name": "CATEGORY_COUNT",
},
"strategy": {"type": "WEIGHTS"},
},
{
# strategy to use prediction score (Active Learning)
"input": {
"type": "SCORES",
"task": "object_detection_zero_waste",
"score": "objectness_least_confidence",
},
"strategy": {"type": "WEIGHTS"},
},
],
},
lightly_config={
"trainer": {
"max_epochs": 25,
},
"loader": {"batch_size": 128},
},
)
After the run, you can again download the split results. You should download the list comprehending all the 600 images to have more performant training.
Train the final model
With the new split, you can train the base model again, following the steps in Finetune YOLO. Please update the --data
accordingly with the new split.
Compare the results with random
You can see in the following table how active learning performs to random. The values are average precisions at 50% recall.
Mean | rigid_plastic | cardboard | metal | soft_plastic | |
---|---|---|---|---|---|
Random | 0.270 | 0.281 | 0.395 | 0.052 | 0.351 |
Active Learning | 0.275 | 0.266 | 0.418 | 0.037 | 0.381 |
The results were computed using 600 images. 300 were randomly selected and shared by both splits, and the remaining part was selected respectively with random and active learning. The model was trained for 100 epochs. You can see how active learning outperformed a randomly taken split.
Updated about 1 month ago