Skip to content

Selection

Provides the user python interface to selection bound to sample ids.

Selection

Selection(dataset_id: UUID, session: Session, input_sample_ids: Iterable[UUID])

Selection interface for candidate sample ids.

Parameters:

Name Type Description Default
dataset_id UUID

Dataset in which the selection is performed.

required
session Session

Database session to resolve selection dependencies.

required
input_sample_ids Iterable[UUID]

Candidate sample ids considered for selection. The iterable is consumed immediately to capture a stable snapshot.

required

diverse

diverse(
    n_samples_to_select: int,
    selection_result_tag_name: str,
    embedding_model_name: str | None = None,
) -> None

Select a diverse subset using embeddings.

Parameters:

Name Type Description Default
n_samples_to_select int

Number of samples to select.

required
selection_result_tag_name str

Tag name for the selection result.

required
embedding_model_name str | None

Optional embedding model name. If None, uses the only available model or raises if multiple exist.

None

metadata_weighting

metadata_weighting(
    n_samples_to_select: int, selection_result_tag_name: str, metadata_key: str
) -> None

Select a subset based on numeric metadata weights.

Parameters:

Name Type Description Default
n_samples_to_select int

Number of samples to select.

required
selection_result_tag_name str

Tag name for the selection result.

required
metadata_key str

Metadata key used as weights (float or int values).

required

multi_strategies

multi_strategies(
    n_samples_to_select: int,
    selection_result_tag_name: str,
    selection_strategies: list[SelectionStrategy],
) -> None

Select a subset based on multiple strategies.

Parameters:

Name Type Description Default
n_samples_to_select int

Number of samples to select.

required
selection_result_tag_name str

Tag name for the selection result.

required
selection_strategies list[SelectionStrategy]

Strategies to compose for selection.

required