Skip to content

DatasetQuery

DatasetQuery

Dataset query utilities for filtering, ordering, and slicing samples.

DatasetQuery

DatasetQuery(dataset: DatasetTable, session: Session)

Class for executing a query on a dataset.

Filtering, ordering, and slicing samples in a dataset

Allows filtering, ordering, and slicing of samples in a dataset. This class can be accessed via calling .query() on a Dataset instance.

dataset : Dataset = ...
query = dataset.query()
The match(), order_by(), and slice() methods can be chained in this order. You can also access the methods directly on the Dataset instance:
dataset.match(...) # shorthand for dataset.query().match(...)

The object is converted to a SQL query that is lazily evaluated when iterating over it or converting it to a list.

match() - Filtering samples

Filtering is done via the match() method.

from lightly_studio.core.dataset_query import SampleField

query_1 = dataset.query().match(SampleField.width > 100)
query_2 = dataset.query().match(SampleField.tags.contains('cat'))
AND and OR operators are available for combining multiple conditions.
from lightly_studio.core.dataset_query import SampleField, AND, OR

query = dataset.query().match(
    AND(
        SampleField.height < 200,
        OR(
            SampleField.file_name == 'image.png',
            SampleField.file_name == 'image2.png',
        )
    )
)

order_by() - Ordering samples

The results can be ordered by using order_by(). For tie-breaking, multiple fields can be provided. The first field has the highest priority. The default is ascending order. To order in descending order, use OrderByField(...).desc().

from lightly_studio.core.dataset_query import OrderByField, SampleField
query = query.order_by(
    OrderByField(SampleField.width),
    OrderByField(SampleField.file_name).desc()
)

slice() - Slicing samples

Slicing can be applied via slice() or bracket notation.

query = query.slice(offset=10, limit=20)
query = query[10:30]  # equivalent to slice(offset=10, limit=20)

Usage of the filtered, ordered and sliced query

Iterating and converting to list

Finally, the query can be executed by iterating over it or converting to a list.

for sample in query:
    print(sample.file_name)
samples = query.to_list()
The samples returned are instances of the Sample class. They are writable, and changes to them will be persisted to the database.

Adding tags to matching samples

The filtered set can also be used to add a tag to all matching samples.

query.add_tag('my_tag')

Selecting a subset of samples using smart selection

A Selection interface can be created from the current query results. It will only select the samples matching the current query at the time of calling selection().

# Choosing 100 diverse samples from the 'cat' tag.
# Save them under the tag name "diverse_cats".
selection = dataset.query().match(
    SampleField.tags.contains('cat')
).selection()
selection.diverse(100, "diverse_cats")

Exporting the query results

An export interface can be created from the current query results.

export = dataset.query().match(...).export()
export.to_coco_object_detections('/path/to/coco.json')

Parameters:

Name Type Description Default
dataset DatasetTable

The dataset to query.

required
session Session

Database session for executing queries.

required

__getitem__

__getitem__(key: _SliceType) -> DatasetQuery

Enable bracket notation for slicing.

Parameters:

Name Type Description Default
key _SliceType

A slice object (e.g., [10:20], [:50], [100:]).

required

Returns:

Type Description
DatasetQuery

Self with slice applied.

Raises:

Type Description
TypeError

If key is not a slice object.

ValueError

If slice contains unsupported features or conflicts with existing slice.

__iter__

__iter__() -> Iterator[Sample]

Iterate over the query results.

Returns:

Type Description
Iterator[Sample]

Iterator of Sample objects from the database.

add_tag

add_tag(tag_name: str) -> None

Add a tag to all samples returned by this query.

First, creates the tag if it doesn't exist. Then applies the tag to all samples that match the current query filters. Samples already having that tag are unchanged, as the database prevents duplicates.

Parameters:

Name Type Description Default
tag_name str

Name of the tag to add to matching samples.

required

export

export() -> DatasetExport

Return a DatasetExport instance which can export the dataset in various formats.

match

match(match_expression: MatchExpression) -> DatasetQuery

Store a field condition for filtering.

Parameters:

Name Type Description Default
match_expression MatchExpression

Defines the filter.

required

Returns:

Type Description
DatasetQuery

Self for method chaining.

Raises:

Type Description
ValueError

If match() has already been called on this instance.

order_by

order_by(*order_by: OrderByExpression) -> DatasetQuery

Store ordering expressions.

Parameters:

Name Type Description Default
order_by OrderByExpression

One or more ordering expressions. They are applied in order. E.g. first ordering by sample width and then by sample file_name will only order the samples with the same sample width by file_name.

()

Returns:

Type Description
DatasetQuery

Self for method chaining.

Raises:

Type Description
ValueError

If order_by() has already been called on this instance.

selection

selection() -> Selection

Selection interface for this query.

The returned Selection snapshots the current query results immediately. Mutating the query after calling this method will therefore not affect the samples used by that Selection instance.

Returns:

Type Description
Selection

Selection interface operating on the current query result snapshot.

slice

slice(offset: int = 0, limit: int | None = None) -> DatasetQuery

Apply offset and limit to results.

Parameters:

Name Type Description Default
offset int

Number of items to skip from beginning (default: 0).

0
limit int | None

Maximum number of items to return (None = no limit).

None

Returns:

Type Description
DatasetQuery

Self for method chaining.

Raises:

Type Description
ValueError

If slice() has already been called on this instance.

to_list

to_list() -> list[Sample]

Execute the query and return the results as a list.

Returns:

Type Description
list[Sample]

List of Sample objects from the database.

SampleField

Fields for querying sample properties in the dataset query system.

SampleField

Providing access to predefined sample fields for queries.

It is used for the query.match(...) and query.order_by(...) methods of the DatasetQuery class.

from lightly_studio.core.dataset_query import DatasetQuery, SampleField, OrderByField

query = dataset.query()
query.match(SampleField.tags.contains("cat"))
query.order_by(OrderByField(SampleField.file_path_abs))
samples = query.to_list()

created_at class-attribute instance-attribute

created_at = DatetimeField(col(created_at))

file_name class-attribute instance-attribute

file_name = StringField(col(file_name))

file_path_abs class-attribute instance-attribute

file_path_abs = StringField(col(file_path_abs))

height class-attribute instance-attribute

height = NumericalField(col(height))

tags class-attribute instance-attribute

tags = TagsAccessor()

width class-attribute instance-attribute

width = NumericalField(col(width))