lightly.active_learning

.agents

class lightly.active_learning.agents.agent.ActiveLearningAgent(api_workflow_client: lightly.api.api_workflow_client.ApiWorkflowClient, query_tag_name: str = 'initial-tag', preselected_tag_name: str = None)

Interface for active learning queries.

Attributes:
api_workflow_client:

The client to connect to the api.

query_set:

Set of filenames corresponding to samples which can possibly be selected. Set to all samples in the query tag or to the whole dataset by default.

labeled_set:

Set of filenames corresponding to samples in the labeled set. Set to all samples in the preselected tag or to an empty list by default.

unlabeled_set:

Set of filenames corresponding to samples which are in the query set but not in the labeled set.

added_set:

Set of filenames corresponding to samples which were added to the labeled set in the last query.

Examples:
>>> # set the token and dataset id
>>> token = '123'
>>> dataset_id = 'XYZ'
>>>
>>> # create an active learning agent
>>> client = ApiWorkflowClient(token, dataset_id)
>>> agent = ActiveLearningAgent(client)
>>>
>>> # make an initial active learning query
>>> sampler_config = SamplerConfig(n_samples=100, name='initial-set')
>>> agent.query(sampler_config)
>>> initial_set = agent.labeled_set
>>>
>>> # train and evaluate a model on the initial set
>>> # make predictions on the query set:
>>> query_set = agent.query_set
>>> # important:
>>> # be sure to keep the order of the query set when you make predictions
>>>
>>> # create active learning scorer
>>> scorer = ScorerClassification(predictions)
>>>
>>> # make a second active learning query
>>> sampler_config = SamplerConfig(n_samples=200, name='second-set')
>>> agent.query(sampler_config, scorer)
>>> added_set = agent.added_set # access only the samples added by this query
property added_set

List of filenames of newly added samples (in the last query).

Raises:

RuntimeError if executed before a query.

property labeled_set

List of filenames indicating selected samples.

query(sampler_config: lightly.active_learning.config.sampler_config.SamplerConfig, al_scorer: lightly.active_learning.scorers.scorer.Scorer = None) → Tuple[List[str], List[str]]

Performs an active learning query.

After the query, the labeled set is updated to contain all selected samples, the added set is recalculated as (new labeled set - old labeled set), and the query set stays the same.

Args:
sampler_config:

The sampling configuration.

al_scorer:

An instance of a class inheriting from Scorer, e.g. a ClassificationScorer.

property query_set

List of filenames for which to calculate active learning scores.

property unlabeled_set

List of filenames which belong to the query set but are not selected.

.config

class lightly.active_learning.config.sampler_config.SamplerConfig(method: lightly.openapi_generated.swagger_client.models.sampling_method.SamplingMethod = 'CORESET', n_samples: int = 32, min_distance: float = - 1, name: str = None)

Configuration class for a sampler.

Attributes:
method:

The method to use for sampling, one of CORESET, RANDOM, CORAL, ACTIVE_LEARNING

n_samples:

The maximum number of samples to be chosen by the sampler including the samples in the preselected tag. One of the stopping conditions.

min_distance:

The minimum distance of samples in the chosen set, one of the stopping conditions.

name:

The name of this sampling, defaults to a name consisting of all other attributes and the datetime. A new tag will be created in the web-app under this name.

Examples:
>>> # sample 100 images with CORESET sampling
>>> config = SamplerConfig(method=SamplingMethod.CORESET, n_samples=100)
>>>
>>> # give your sampling a name
>>> config = SamplerConfig(method=SamplingMethod.CORESET, n_samples=100, name='my-sampling')
>>>
>>> # use minimum distance between samples as stopping criterion
>>> config = SamplerConfig(method=SamplingMethod.CORESET, n_samples=-1, min_distance=0.1)

.scorers

class lightly.active_learning.scorers.classification.ScorerClassification(model_output: Union[numpy.ndarray, List[List[float]]])

Class to compute active learning scores from the model_output of a classification task.

Currently supports the following scorers:

The following three uncertainty scores are taken from http://burrsettles.com/pub/settles.activelearning.pdf, Section 3.1, page 12f and also explained in https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b They all have in common, that the score is highest if all classes have the same confidence and are 0 if the model assigns 100% probability to a single class. The differ in the number of class confidences they take into account.

uncertainty_least_confidence:

This score is 1 - the highest confidence prediction. It is high when the confidence about the most probable class is low.

uncertainty_margin

This score is 1- the margin between the highest conficence and second highest confidence prediction. It is high when the model cannot decide between the two most probable classes.

uncertainty_entropy:
This scorer computes the entropy of the prediction. The confidences

for all classes are considered to compute the entropy of a sample.

Attributes:
model_output:

Predictions of shape N x C where N is the number of unlabeled samples and C is the number of classes in the classification task. Must be normalized such that the sum over each row is 1. The order of the predictions must be the one specified by ActiveLearningAgent.unlabeled_set.

Examples:
>>> # example with three unlabeled samples
>>> al_agent.unlabeled_set
>>> > ['img0.jpg', 'img1.jpg', 'img2.jpg']
>>> predictions = np.array(
>>>     [
>>>          [0.1, 0.9], # predictions for img0.jpg
>>>          [0.3, 0.7], # predictions for img1.jpg
>>>          [0.8, 0.2], # predictions for img2.jpg
>>>     ] 
>>> )
>>> np.sum(predictions, axis=1)
>>> > array([1., 1., 1.])
>>> scorer = ScorerClassification(predictions)
calculate_scores(normalize_to_0_1: bool = True) → Dict[str, numpy.ndarray]

Calculates and returns the active learning scores.

Args:
normalize_to_0_1:

If this is true, each score is normalized to have a theoretical minimum of 0 and a theoretical maximum of 1.

Returns:

A dictionary mapping from the score name (as string) to the scores (as a single-dimensional numpy array).

classmethod score_names() → List[str]

Returns the names of the calculated active learning scores

class lightly.active_learning.scorers.detection.ScorerObjectDetection(model_output: List[lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput], config: Dict = None)

Class to compute active learning scores from the model_output of an object detection task.

Currently supports the following scorers:

object_frequency:

This scorer uses model predictions to focus more on images which have many objects in them. Use this scorer if you want scenes with lots of objects in them like we usually want in computer vision tasks such as perception in autonomous driving.

objectness_least_confidence:

This score is 1 - the mean of the highest confidence prediction. Use this scorer to select images where the model is insecure about both whether it found an object at all and the class of the object.

scores from ScorerClassification:

These scores are computed for each object detection out of the class probability prediction for this detection. Then these scores are reduced to one score per image by taking the maximum. The scores are named as f”classification_{score_name}”.

Attributes:
model_output:

List of model outputs in an object detection setting.

config:

A dictionary containing additional parameters for the scorers.

frequency_penalty (float):

Used by the object-frequency scorer. If objects of the same class are within the same sample we multiply them with the penalty. 1.0 has no effect. 0.5 would count the first object fully and the second object of the same class only 50%. Lowering this value results in a more balanced setting of the classes. 0.0 is max penalty. (default: 0.25)

min_score (float):

Used by the object-frequency scorer. Specifies the minimum score per sample. All scores are scaled to [min_score, 1.0] range. Lowering the number makes the sampler focus more on samples with many objects. (default: 0.9)

Examples:
>>> # typical model output
>>> predictions = [{
>>>     'boxes': [[0.1, 0.2, 0.3, 0.4]],
>>>     'object_probabilities': [0.1024],
>>>     'class_probabilities': [[0.5, 0.41, 0.09]]
>>> }]
>>>
>>> # generate detection outputs
>>> model_output = []
>>> for prediction in predictions:
>>>     # convert each box to a BoundingBox object
>>>     boxes = []
>>>     for box in prediction['boxes']:
>>>         x0, x1 = box[0], box[2]
>>>         y0, y1 = box[1], box[3]
>>>         boxes.append(BoundingBox(x0, y0, x1, y1))
>>>     # create detection outputs
>>>     output = ObjectDetectionOutput(
>>>         boxes,
>>>         prediction['object_probabilities'],
>>>         prediction['class_probabilities']
>>>     )
>>>     model_output.append(output)
>>>
>>> # create scorer from output
>>> scorer = ScorerObjectDetection(model_output)
calculate_scores() → Dict[str, numpy.ndarray]

Calculates and returns the active learning scores.

Returns:

A dictionary mapping from the score name (as string) to the scores (as a single-dimensional numpy array).

classmethod score_names() → List[str]

Returns the names of the calculated active learning scores

.utils

Bounding Box Utils

class lightly.active_learning.utils.bounding_box.BoundingBox(x0: float, y0: float, x1: float, y1: float)

Class which unifies different bounding box formats.

Attributes:
x0:

x0 coordinate (normalized to [0, 1])

y0:

y0 coordinate (normalized to [0, 1])

x1:

x1 coordinate (normalized to [0, 1])

y1:

y1 coordinate (normalized to [0, 1])

Examples: >>> # simple case, format (x0, y0, x1, y1) >>> bbox = BoundingBox(0.1, 0.2, 0.3, 0.4) >>> >>> # same bounding box in x, y, w, h format >>> bbox = BoundingBox.from_x_y_w_h(0.1, 0.2, 0.2, 0.2) >>> >>> # often the coordinates are not yet normalized by image size >>> # for example, for a 100 x 100 image, the coordinates could be >>> # (x0, y0, x1, y1) = (10, 20, 30, 40) >>> W, H = 100, 100 # get image shape >>> bbox = BoundingBox(10 / W, 20 / H, 30 / W, 40 / H)

property area

Returns the area of the bounding box relative to the area of the image.

classmethod from_x_y_w_h(x: float, y: float, w: float, h: float)

Helper to convert from bounding box format with width and height.

Examples: >>> bbox = BoundingBox.from_x_y_w_h(0.1, 0.2, 0.2, 0.2)

property height

Returns the height of the bounding box relative to the image size.

property width

Returns the width of the bounding box relative to the image size.

Object Detection Outputs

class lightly.active_learning.utils.object_detection_output.ObjectDetectionOutput(boxes: List[lightly.active_learning.utils.bounding_box.BoundingBox], object_probabilities: List[float], class_probabilities: List[List[float]])

Class which unifies different object detection output formats.

Attributes:
boxes:

List of BoundingBox objects with coordinates (x0, y0, x1, y1).

object_probabilities:

List of probabilities that the boxes are indeed objects.

class_probabilities:

List of probabilities for the different classes for each box.

scores:

List of confidence scores (i.e. max(class prob) * objectness).

labels:

List of labels (i.e. argmax(class prob)).

Examples:
>>> # typical model output
>>> prediction = {
>>>     'boxes': [[0.1, 0.2, 0.3, 0.4]],
>>>     'object_probabilities': [0.6],
>>>     'class_probabilities': [0.1, 0.5],
>>> }
>>>
>>> # convert bbox to objects
>>> boxes = [BoundingBox(0.1, 0.2, 0.3, 0.4)]
>>> object_probabilities = prediction['object_probabilities']
>>> class_probabilities = prediction['class_probabilities']
>>>
>>> # create detection output
>>> detection_output = ObjectDetectionOutput(
>>>     boxes,
>>>     object_probabilities,
>>>     class_probabilities,
>>> )
classmethod from_scores(boxes: List[lightly.active_learning.utils.bounding_box.BoundingBox], scores: List[float], labels: List[int])

Helper to convert from output format with scores.

We advise not using this method if you want to use the uncertainty active learning scores correctly.

Since this output format does not provide class probabilities, they will be replaced by a estimated class probability computed by the objectness. The highest class probability matches the label. The objectness will be set to the score for each bounding box.

Args:
boxes:

List of BoundingBox objects with coordinates (x0, y0, x1, y1).

scores:

List of confidence scores (i.e. max(class prob) * objectness).

labels:

List of labels.

Examples:
>>> # typical model output
>>> prediction = {
>>>     'boxes': [[0.1, 0.2, 0.3, 0.4]],
>>>     'scores': [0.1234],
>>>     'labels': [1]
>>> }
>>>
>>> # convert bbox to objects
>>> boxes = [BoundingBox(0.1, 0.2, 0.3, 0.4)]
>>> scores = prediction['scores']
>>> labels = prediction['labels']
>>>
>>> # create detection output
>>> detection_output = ObjectDetectionOutput.from_scores(
>>>     boxes, scores, labels)