DatasetQuery¶
DatasetQuery¶
Dataset query utilities for filtering, ordering, and slicing samples.
DatasetQuery ¶
DatasetQuery(dataset: DatasetTable, session: Session)
Class for executing a query on a dataset.
Filtering, ordering, and slicing samples in a dataset¶
Allows filtering, ordering, and slicing of samples in a dataset.
This class can be accessed via calling .query() on a Dataset instance.
dataset : Dataset = ...
query = dataset.query()
match(), order_by(), and slice() methods can be chained in this order.
You can also access the methods directly on the Dataset instance:
dataset.match(...) # shorthand for dataset.query().match(...)
The object is converted to a SQL query that is lazily evaluated when iterating over it or converting it to a list.
match() - Filtering samples¶
Filtering is done via the match() method.
from lightly_studio.core.dataset_query import SampleField
query_1 = dataset.query().match(SampleField.width > 100)
query_2 = dataset.query().match(SampleField.tags.contains('cat'))
from lightly_studio.core.dataset_query import SampleField, AND, OR
query = dataset.query().match(
AND(
SampleField.height < 200,
OR(
SampleField.file_name == 'image.png',
SampleField.file_name == 'image2.png',
)
)
)
order_by() - Ordering samples¶
The results can be ordered by using order_by(). For tie-breaking, multiple fields
can be provided. The first field has the highest priority. The default is
ascending order. To order in descending order, use OrderByField(...).desc().
from lightly_studio.core.dataset_query import OrderByField, SampleField
query = query.order_by(
OrderByField(SampleField.width),
OrderByField(SampleField.file_name).desc()
)
slice() - Slicing samples¶
Slicing can be applied via slice() or bracket notation.
query = query.slice(offset=10, limit=20)
query = query[10:30] # equivalent to slice(offset=10, limit=20)
Usage of the filtered, ordered and sliced query¶
Iterating and converting to list¶
Finally, the query can be executed by iterating over it or converting to a list.
for sample in query:
print(sample.file_name)
samples = query.to_list()
Sample class. They are writable, and
changes to them will be persisted to the database.
Adding tags to matching samples¶
The filtered set can also be used to add a tag to all matching samples.
query.add_tag('my_tag')
Selecting a subset of samples using smart selection¶
A Selection interface can be created from the current query results. It will only select the samples matching the current query at the time of calling selection().
# Choosing 100 diverse samples from the 'cat' tag.
# Save them under the tag name "diverse_cats".
selection = dataset.query().match(
SampleField.tags.contains('cat')
).selection()
selection.diverse(100, "diverse_cats")
Exporting the query results¶
An export interface can be created from the current query results.
export = dataset.query().match(...).export()
export.to_coco_object_detections('/path/to/coco.json')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
DatasetTable
|
The dataset to query. |
required |
session |
Session
|
Database session for executing queries. |
required |
__getitem__ ¶
__getitem__(key: _SliceType) -> DatasetQuery
Enable bracket notation for slicing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key |
_SliceType
|
A slice object (e.g., [10:20], [:50], [100:]). |
required |
Returns:
| Type | Description |
|---|---|
DatasetQuery
|
Self with slice applied. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If key is not a slice object. |
ValueError
|
If slice contains unsupported features or conflicts with existing slice. |
__iter__ ¶
__iter__() -> Iterator[Sample]
Iterate over the query results.
Returns:
| Type | Description |
|---|---|
Iterator[Sample]
|
Iterator of Sample objects from the database. |
add_tag ¶
add_tag(tag_name: str) -> None
Add a tag to all samples returned by this query.
First, creates the tag if it doesn't exist. Then applies the tag to all samples that match the current query filters. Samples already having that tag are unchanged, as the database prevents duplicates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tag_name |
str
|
Name of the tag to add to matching samples. |
required |
export ¶
export() -> DatasetExport
Return a DatasetExport instance which can export the dataset in various formats.
match ¶
match(match_expression: MatchExpression) -> DatasetQuery
Store a field condition for filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match_expression |
MatchExpression
|
Defines the filter. |
required |
Returns:
| Type | Description |
|---|---|
DatasetQuery
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If match() has already been called on this instance. |
order_by ¶
order_by(*order_by: OrderByExpression) -> DatasetQuery
Store ordering expressions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by |
OrderByExpression
|
One or more ordering expressions. They are applied in order. E.g. first ordering by sample width and then by sample file_name will only order the samples with the same sample width by file_name. |
()
|
Returns:
| Type | Description |
|---|---|
DatasetQuery
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If order_by() has already been called on this instance. |
selection ¶
selection() -> Selection
Selection interface for this query.
The returned Selection snapshots the current query results immediately. Mutating the query after calling this method will therefore not affect the samples used by that Selection instance.
Returns:
| Type | Description |
|---|---|
Selection
|
Selection interface operating on the current query result snapshot. |
slice ¶
slice(offset: int = 0, limit: int | None = None) -> DatasetQuery
Apply offset and limit to results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
offset |
int
|
Number of items to skip from beginning (default: 0). |
0
|
limit |
int | None
|
Maximum number of items to return (None = no limit). |
None
|
Returns:
| Type | Description |
|---|---|
DatasetQuery
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If slice() has already been called on this instance. |
SampleField¶
Fields for querying sample properties in the dataset query system.
SampleField ¶
Providing access to predefined sample fields for queries.
It is used for the query.match(...) and query.order_by(...) methods of the
DatasetQuery class.
from lightly_studio.core.dataset_query import DatasetQuery, SampleField, OrderByField
query = dataset.query()
query.match(SampleField.tags.contains("cat"))
query.order_by(OrderByField(SampleField.file_path_abs))
samples = query.to_list()