diff --git a/AGENTS.md b/AGENTS.md index 59629ab1fb..2f4eee0671 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -58,10 +58,9 @@ More specific conventions for subdirectories: ### attrs Only All domain classes use `attrs` `@define`. No dataclasses, no Pydantic. - Immutable value objects (parameters, kernels, priors, transformations, objectives, - targets): `@define(frozen=True, slots=False)`. + targets): `@define(frozen=True)`. - Mutable stateful objects (campaign, surrogates, recommenders): `@define`. -- `slots=False` required with `frozen=True` when `cached_property` is needed. See - `attrs` issue #164 +- `slots=False` is required when `cached_property` is needed. See `attrs` issue #164 - Also use `slots=False` when monkeypatching is needed (e.g., `register_hooks`) ### Inheritance: ABC + SerialMixin + Protocol @@ -71,14 +70,25 @@ All domain classes use `attrs` `@define`. No dataclasses, no Pydantic. 3. Concrete classes: Inherit from ABC. ### Fields and Methods -- Use `field()` with `validator=`, `converter=`, `default=`, `factory=`, `alias=`. +- Use `field()` with arguments in this order: 1) `alias=` (if needed), 2) `init=` + (if needed), 3) `default=` / `factory=`, 4) `converter=`, 5) `validator=`. - Private fields: `_` prefix, typically `init=False`. - Store each piece of information once — no data duplication. - Use `attrs.evolve()` for modified copies of frozen objects. - Use `on_setattr` hooks for cache invalidation on mutable objects. +- Use `kw_only=True` deliberately: only when positional construction would be + ambiguous or error-prone (e.g., multiple fields of the same type, or + optional/secondary fields that should not be passed positionally). Do not + apply `kw_only` to all fields by default. - `ClassVar[bool]` for capability flags (`supports_transfer_learning`, etc.). -- Order class content like this: 1) Attributes, 2) validators and post_init, 3) - properties, 4) methods. Within each group use alphabetical order. +- Order class content like this: 1) Attributes, 2) default and validator methods, + 3) `__attrs_post_init__`, 4) properties, 5) methods. + - Attributes are ordered by functionality/importance (primary identity fields + first, optional/secondary fields last), not alphabetically. + - Default and validator methods mirror the attribute order. For a given + attribute, the default method (`_default_`) comes before its validator + (`_validate_`). + - Regular methods are ordered alphabetically. ### Attribute Docstrings String literals immediately below field declarations, blank lines between attributes. @@ -98,6 +108,17 @@ Every module using `@define` must end with: Name descriptively: `from_product`, `from_dataframe`, `from_parameter`, `from_config`, `from_json`, `from_dict`, `from_preset`. +Use `Self` return type and `cls()` construction for proper subclass support: +```python +from typing_extensions import Self + +@classmethod +def from_parameter(cls, parameter: DiscreteParameter) -> Self: + """Create a subspace from a single parameter.""" + return cls(parameters=[parameter], ...) # Use cls(), not ClassName() +``` +This ensures subclasses return their own type, not the base class type. + ### classproperty Custom `@classproperty` from `baybe.utils.basic` for class-level computed properties. @@ -227,11 +248,24 @@ Three tiers: ## 11. Validation Patterns - Inline validators: `field(validator=(instance_of(str), min_len(1)))`, `in_()`, - `deep_iterable()`, custom `finite_float`, `gt()`. + `deep_iterable()`, custom `finite_float`, `gt()`. Order validators from simplest + to most complex: cheap structural checks (e.g., `min_len`, `instance_of`) before + expensive semantic checks (e.g., cross-field consistency, name uniqueness). - Method validators: `@_field.validator` with `# noqa: DOC101, DOC103` for validators needing `self` access. -- Cross-field: `__attrs_post_init__` when validation involves multiple fields. +- Cross-field: `__attrs_post_init__` is a last resort. Method validators + (`@field.validator`) already receive `self` and can read other already-set + attributes, so most cross-field checks belong there instead. When one field + must be compatible with another, attach the validator to the later field — + attrs sets fields in declaration order, so earlier fields are always available + via `self` at that point. When one attribute's value must be adjusted after + all fields are set — which is typically a workaround and should itself be + questioned — `__attrs_post_init__` is acceptable. - Converters: `field(converter=to_searchspace)` for automatic type coercion. +- If a converter already guarantees a specific type (e.g., `converter=list` + always produces a `list`, a custom converter always returns a known type), + omit any `instance_of(...)` validator for that same type — the check is + redundant. - Reusable validators in `baybe/utils/validation.py`: `finite_float`, `non_nan_float`, `non_inf_float`, `validate_not_nan`, `validate_target_input`, `validate_parameter_input`, `validate_object_names`. @@ -296,6 +330,7 @@ For a full list of available tox environments and developer commands, see - No hardcoded enum values in comments — link the enum. - No private field names in user-facing messages — use public alias. - No hardcoded class names in repr/errors — use `self.__class__.__name__`. +- No hardcoded class names in classmethods — use `cls()` and return `Self`. - No silent errors. No mutation of caller-provided dicts. - No silent defaults or "best effort" fallbacks — if input is invalid, raise. - No proceeding past failed preconditions into expensive computation. diff --git a/CHANGELOG.md b/CHANGELOG.md index 6e60e254f4..4cd0ea572e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,13 +5,62 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] +### Breaking Changes +- `Campaign.measurements` no longer contains `FitNr` or `BatchNr` metadata columns +- `validate_parameter_names`, `validate_cardinality_constraints_are_nonoverlapping` + and `validate_cardinality_constraint_parameter_bounds` are no longer available + as public utilities +- `is_constrained` property removed from `SubspaceDiscrete`, `SubspaceContinuous`, + and `SearchSpace` +- `candidates_exp` argument removed from `SubspaceDiscrete.subset_masks`, + `SubspaceDiscrete.sample_subset_masks`, `SearchSpace.subsets`, and + `SearchSpace.sample_subsets` +- `SubspaceDiscrete.get_candidates` now returns only the experimental representation + instead of a tuple of experimental and computational representations + +### Added +- `narwhals` as a hard dependency +- `CandidatesProtocol` as an interface for candidates generation +- `TableCandidates` and `ProductCandidates` classes implementing `CandidatesProtocol` +- `DiscreteParameter.is_finite` property +- `SubspaceDiscrete.batch_constraints` field for storing batch-level constraints +- `SubspaceDiscrete.from_dataframe` now accepts `batch_constraints` + ### Changed +- Internal `Campaign` state model simplified: recommended and excluded experiments + are now stored as dataframes instead of being tracked as metadata flags +- `SubspaceContinuous` now offers a simpler interface for passing constraints, + no longer requiring users to manually group constraints according to their type +- Parameter and constraint validation has been streamlined, using `validate_parameters` + and `validate_constraints` as the only remaining public entry points +- `_recommend_discrete` and kin now return a `pd.DataFrame` subselection of the + candidates instead of a `pd.Index` +- `SubspaceDiscrete.from_product` and `SubspaceDiscrete.from_simplex` now split + their `constraints` argument into filtering constraints (applied during construction) + and batch constraints (stored in `batch_constraints`) +- Internal search space and recommender logic simplified by reducing indirection and + argument passing between methods - `BOTORCH` GP preset now includes `BetaPrior(2.5, 1.5)` for the task covariance kernel in multi-task scenarios, matching BoTorch's `MultiTaskGP` defaults introduced in version `0.18.0` - The `BOTORCH` GP preset now requires BoTorch `>= 0.18.0` and raises an `IncompatibilityError` if an older version is installed + +### Fixed +- Deserialization with constructor selection now correctly respects converter settings + +### Deprecations +- `Campaign.n_fits_done` and `Campaign.n_batches_done` attributes +- `SubspaceDiscrete` ignores any `empty_encoding` when provided +- `SubspaceDiscrete` no longer accepts a `comp_rep` argument +- `SubspaceDiscrete.constraints` attribute (use `batch_constraints` to provide and + access batch-level constraints; filtering constraints are only needed during subspace + construction and are thus no longer stored). +- `SubspaceDiscrete.constraints_batch` property (use `batch_constraints` instead) +- `SubspaceDiscrete.exp_rep` attribute (use `get_candidates()` instead) +- `SubspaceDiscrete.comp_rep` attribute (use `transform(get_candidates())` instead) + ## [0.15.0] - 2026-06-11 ### Breaking Changes - `GaussianProcessSurrogate` no longer automatically adds a task kernel in multi-task @@ -77,12 +126,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - Broken cache validation for certain `Campaign.recommend` cases - `ContinuousCardinalityConstraint` now works in hybrid search spaces +- Typo in `_FixedNumericalContinuousParameter` where `is_numeric` was used + instead of `is_numerical` - `SHAPInsight` breaking with `numpy>=2.4` due to no longer accepted implicit array to scalar conversion - Using `np.isclose` for assessing equality of `Interval` bounds instead of hard equality check -- Typo in `_FixedNumericalContinuousParameter` where `is_numeric` was used - instead of `is_numerical` ### Removed - `parallel_runs` argument from `simulate_scenarios`, since parallelization diff --git a/baybe/acquisition/acqfs.py b/baybe/acquisition/acqfs.py index 5e3b08b1cf..96110fa66b 100644 --- a/baybe/acquisition/acqfs.py +++ b/baybe/acquisition/acqfs.py @@ -104,7 +104,9 @@ def get_integration_points(self, searchspace: SearchSpace) -> pd.DataFrame: # Discrete part if not searchspace.discrete.is_empty: - candidates_discrete = searchspace.discrete.comp_rep + candidates_discrete = searchspace.discrete.transform( + searchspace.discrete.get_candidates() + ) n_candidates = self.sampling_n_points or math.ceil( self.sampling_fraction * len(candidates_discrete) # type: ignore[operator] ) diff --git a/baybe/campaign.py b/baybe/campaign.py index 278d9812d0..57fc3594dc 100644 --- a/baybe/campaign.py +++ b/baybe/campaign.py @@ -7,10 +7,9 @@ import warnings from collections.abc import Collection, Sequence from functools import reduce -from typing import TYPE_CHECKING, Any, TypeVar +from typing import TYPE_CHECKING, Any, NoReturn, TypeVar import cattrs -import numpy as np import pandas as pd from attrs import Attribute, define, evolve, field, fields, setters from attrs.converters import optional @@ -19,6 +18,7 @@ from baybe.constraints.base import DiscreteConstraint from baybe.exceptions import ( + DeprecationError, IncompatibilityError, NoMeasurementsError, NotEnoughPointsLeftError, @@ -31,7 +31,6 @@ from baybe.recommenders.meta.sequential import TwoPhaseMetaRecommender from baybe.recommenders.pure.bayesian.base import BayesianRecommender from baybe.recommenders.pure.nonpredictive.base import NonPredictiveRecommender -from baybe.searchspace._filtered import FilteredSubspaceDiscrete from baybe.searchspace.core import ( SearchSpace, SearchSpaceType, @@ -60,11 +59,10 @@ _T = TypeVar("_T") -# Metadata columns -_RECOMMENDED = "recommended" -_MEASURED = "measured" +# Legacy constants kept for deserialization migration only _EXCLUDED = "excluded" -_METADATA_COLUMNS = [_RECOMMENDED, _MEASURED, _EXCLUDED] +_MEASURED = "measured" +_RECOMMENDED = "recommended" def _set_with_cache_cleared(instance: Campaign, attribute: Attribute, value: _T) -> _T: @@ -171,75 +169,60 @@ def _validate_objective( # noqa: DOC101, DOC103 ) """Allow recommending pending experiments.""" - # Metadata - _searchspace_metadata: pd.DataFrame = field(init=False, eq=eq_dataframe) - """Metadata tracking the experimentation status of the search space.""" - - n_batches_done: int = field(default=0, init=False) - """The number of already processed batches.""" + # Private + _excluded_experiments: pd.DataFrame = field(eq=eq_dataframe, init=False) + """The parameter configurations that have been excluded from recommendations.""" - n_fits_done: int = field(default=0, init=False) - """The number of fits already done.""" + _measurements: pd.DataFrame = field(eq=eq_dataframe, init=False) + """The measurements added to the campaign.""" - # Private - _measurements_exp: pd.DataFrame = field( - factory=pd.DataFrame, eq=eq_dataframe, init=False - ) - """The experimental representation of the conducted experiments.""" + _recommended_experiments: pd.DataFrame = field(eq=eq_dataframe, init=False) + """The (deduplicated) parameter configurations that have been recommended.""" _cached_recommendation: pd.DataFrame | None = field( default=None, init=False, eq=False ) """The cached recommendations.""" - @_searchspace_metadata.default - def _default_searchspace_metadata(self) -> pd.DataFrame: - """Create a fresh metadata object.""" - df = pd.DataFrame( - False, - index=self.searchspace.discrete.exp_rep.index, - columns=_METADATA_COLUMNS, - ) + @_measurements.default + def _default_measurements(self) -> pd.DataFrame: + """Create an empty measurements DataFrame with the correct schema.""" + cols = [p.name for p in self.searchspace.parameters] + [ + t.name for t in (self.objective.targets if self.objective else ()) + ] + return pd.DataFrame(columns=cols) - return df + @_excluded_experiments.default + def _default_excluded_experiments(self) -> pd.DataFrame: + """Create an empty excluded experiments DataFrame with correct schema.""" + cols = [p.name for p in self.searchspace.parameters] + return pd.DataFrame(columns=cols) + + @_recommended_experiments.default + def _default_recommended_experiments(self) -> pd.DataFrame: + """Create an empty recommended experiments DataFrame with correct schema.""" + cols = [p.name for p in self.searchspace.parameters] + return pd.DataFrame(columns=cols) @override def __str__(self) -> str: - recommended_count = sum(self._searchspace_metadata[_RECOMMENDED]) - measured_count = sum(self._searchspace_metadata[_MEASURED]) - excluded_count = sum(self._searchspace_metadata[_EXCLUDED]) - n_elements = len(self._searchspace_metadata) - searchspace_fields = [ - to_string( - "Recommended:", - f"{recommended_count}/{n_elements}", - single_line=True, - ), - to_string( - "Measured:", - f"{measured_count}/{n_elements}", - single_line=True, - ), - to_string( - "Excluded:", - f"{excluded_count}/{n_elements}", - single_line=True, - ), - ] - metadata_fields = [ - to_string("Batches done", self.n_batches_done, single_line=True), - to_string("Fits done", self.n_fits_done, single_line=True), - to_string("Discrete Subspace Meta Data", *searchspace_fields), - ] - metadata = to_string("Meta Data", *metadata_fields) - fields = [metadata, self.searchspace, self.objective, self.recommender] - + fields = [self.searchspace, self.objective, self.recommender] return to_string(self.__class__.__name__, *fields) @property def measurements(self) -> pd.DataFrame: """The experimental data added to the Campaign.""" - return self._measurements_exp + return self._measurements + + @property + def n_batches_done(self) -> NoReturn: + """Deprecated!""" + raise DeprecationError("'n_batches_done' is no longer available.") + + @property + def n_fits_done(self) -> NoReturn: + """Deprecated!""" + raise DeprecationError("'n_fits_done' is no longer available.") @property def parameters(self) -> tuple[Parameter, ...]: @@ -353,21 +336,8 @@ def add_measurements( self.clear_cache() # Read in measurements and add them to the database - self.n_batches_done += 1 - to_insert = data.copy() - to_insert["BatchNr"] = self.n_batches_done - to_insert["FitNr"] = np.nan - - self._measurements_exp = pd.concat( - [self._measurements_exp, to_insert], axis=0, ignore_index=True - ) - - # Update metadata - if self.searchspace.type in (SearchSpaceType.DISCRETE, SearchSpaceType.HYBRID): - idxs_matched = fuzzy_row_match( - self.searchspace.discrete.exp_rep, data, self.parameters - ) - self._searchspace_metadata.loc[idxs_matched, _MEASURED] = True + frames = [f for f in (self._measurements, data) if not f.empty] + self._measurements = pd.concat(frames, axis=0, ignore_index=True) def update_measurements( self, @@ -378,7 +348,7 @@ def update_measurements( This can be useful to correct mistakes or update target measurements. The match to existing data entries is made based on the index. This will reset - the `FitNr` of the corresponding measurement and reset cached recommendations. + cached recommendations. Args: data: The measurement data to be updated (with filled values for targets). @@ -410,7 +380,7 @@ def update_measurements( ) # Allow only existing indices - if nonmatching_idxs := set(data.index).difference(self._measurements_exp.index): + if nonmatching_idxs := set(data.index).difference(self.measurements.index): raise ValueError( f"Updating measurements requires indices matching the " f"existing measurements. The following indices were in the input, but " @@ -419,10 +389,7 @@ def update_measurements( # Perform the update cols = [p.name for p in self.parameters] + [t.name for t in self.targets] - self._measurements_exp.loc[data.index, cols] = data[cols] - - # Reset fit number - self._measurements_exp.loc[data.index, "FitNr"] = np.nan + self._measurements.loc[data.index, cols] = data[cols] def toggle_discrete_candidates( # noqa: DOC501 self, @@ -461,7 +428,7 @@ def toggle_discrete_candidates( # noqa: DOC501 # * Additional shortcuts might be possible. self.clear_cache() - df = self.searchspace.discrete.exp_rep + df = self.searchspace.discrete.get_candidates() if isinstance(constraints, pd.DataFrame): # Determine the candidate subset to be toggled @@ -488,7 +455,25 @@ def toggle_discrete_candidates( # noqa: DOC501 ) if not dry_run: - self._searchspace_metadata.loc[points.index, _EXCLUDED] = exclude + if exclude and not points.empty: + # Add the toggled points (avoid duplicates) + frames = [ + f for f in (self._excluded_experiments, points) if not f.empty + ] + self._excluded_experiments = ( + pd.concat(frames, axis=0).drop_duplicates().reset_index(drop=True) + ) + elif not exclude and not self._excluded_experiments.empty: + # Remove the re-included points + merged = pd.merge( + self._excluded_experiments, + points, + indicator=True, + how="left", + ) + self._excluded_experiments = self._excluded_experiments[ + merged["_merge"].eq("left_only").values + ].reset_index(drop=True) return points @@ -540,34 +525,63 @@ def recommend( ): return cache - # Update recommendation meta data - if len(self._measurements_exp) > 0: - self.n_fits_done += 1 - self._measurements_exp.fillna({"FitNr": self.n_fits_done}, inplace=True) - # Prepare the search space according to the current campaign state if self.searchspace.type is SearchSpaceType.DISCRETE: # TODO: This implementation should at some point be hidden behind an # appropriate public interface, like `SubspaceDiscrete.filter()` - mask_todrop = self._searchspace_metadata[_EXCLUDED].astype(bool) - if not self.allow_recommending_already_recommended: - mask_todrop |= self._searchspace_metadata[_RECOMMENDED] - if not self.allow_recommending_already_measured: - mask_todrop |= self._searchspace_metadata[_MEASURED] + candidates = self.searchspace.discrete.get_candidates() + mask_todrop = pd.Series(False, index=candidates.index) + if not self._excluded_experiments.empty: + mask_todrop |= ( + pd.merge( + candidates, + self._excluded_experiments, + indicator=True, + how="left", + )["_merge"] + .eq("both") + .to_numpy() + ) + if ( + not self.allow_recommending_already_recommended + and not self._recommended_experiments.empty + ): + mask_todrop |= ( + pd.merge( + candidates, + self._recommended_experiments, + indicator=True, + how="left", + )["_merge"] + .eq("both") + .to_numpy() + ) + if ( + not self.allow_recommending_already_measured + and not self._measurements.empty + ): + measured_idxs = fuzzy_row_match( + candidates, self._measurements, self.parameters + ) + mask_todrop.loc[measured_idxs] = True if ( not self.allow_recommending_pending_experiments and pending_experiments is not None ): - mask_todrop |= pd.merge( - self.searchspace.discrete.exp_rep, - pending_experiments, - indicator=True, - how="left", - )["_merge"].eq("both") + mask_todrop |= ( + pd.merge( + candidates, + pending_experiments, + indicator=True, + how="left", + )["_merge"] + .eq("both") + .to_numpy() + ) searchspace = evolve( self.searchspace, - discrete=FilteredSubspaceDiscrete.from_subspace( - self.searchspace.discrete, ~mask_todrop.to_numpy() + discrete=evolve( + self.searchspace.discrete, exp_rep=candidates.loc[~mask_todrop] ), ) else: @@ -582,7 +596,7 @@ def recommend( batch_size, searchspace, self.objective, - self._measurements_exp, + self.measurements, pending_experiments, ) is_nonpredictive = isinstance(recommender, NonPredictiveRecommender) @@ -596,7 +610,7 @@ def recommend( batch_size, searchspace, self.objective, - self._measurements_exp, + self.measurements, None if is_nonpredictive else pending_experiments, ) except NotEnoughPointsLeftError as ex: @@ -635,9 +649,19 @@ def recommend( ): self._cache_recommendation(rec) - # Update metadata + # Track recommended experiments (deduplicated) if self.searchspace.type in (SearchSpaceType.DISCRETE, SearchSpaceType.HYBRID): - self._searchspace_metadata.loc[rec.index, _RECOMMENDED] = True + param_cols = [p.name for p in self.parameters] + frames = [ + f + for f in (self._recommended_experiments, rec[param_cols]) + if not f.empty + ] + self._recommended_experiments = ( + pd.concat(frames, axis=0, ignore_index=True) + .drop_duplicates() + .reset_index(drop=True) + ) return rec @@ -976,6 +1000,53 @@ def _drop_version(dict_: dict) -> dict: return dict_ +# >>>>>>>>>> Deprecation +def _discard_legacy_fields(dict_: dict, /) -> dict: + """Discard legacy fields from a Campaign dictionary during structuring.""" + dict_.pop("n_fits_done", None) + dict_.pop("n_batches_done", None) + + # Migrate legacy "measurements_exp" key to "measurements" + if "measurements_exp" in dict_: + dict_["measurements"] = dict_.pop("measurements_exp") + + # Strip FitNr/BatchNr columns from legacy measurements + if "measurements" in dict_: + meas = converter.structure(dict_["measurements"], pd.DataFrame) + cols_to_drop = [c for c in ("FitNr", "BatchNr") if c in meas.columns] + if cols_to_drop: + dict_["measurements"] = converter.unstructure( + meas.drop(columns=cols_to_drop) + ) + + # Migrate legacy _searchspace_metadata to new fields + if "searchspace_metadata" in dict_: + metadata = converter.structure(dict_.pop("searchspace_metadata"), pd.DataFrame) + if _RECOMMENDED in metadata.columns: + if "recommended_experiments" not in dict_: + recommended_idxs = metadata.index[metadata[_RECOMMENDED]] + if len(recommended_idxs) > 0: + dict_["_legacy_recommended_idxs"] = recommended_idxs + if _EXCLUDED in metadata.columns: + if "excluded_experiments" not in dict_: + excluded_idxs = metadata.index[metadata[_EXCLUDED]] + if len(excluded_idxs) > 0: + dict_["_legacy_excluded_idxs"] = excluded_idxs + + return dict_ + + +# <<<<<<<<<< Deprecation + + +def _prepare_for_structuring(dict_: dict, /) -> dict: + """Prepare a Campaign dictionary for structuring.""" + dict_ = dict_.copy() + _drop_version(dict_) + _discard_legacy_fields(dict_) + return dict_ + + # Register (un-)structure hooks unstructure_hook = cattrs.gen.make_dict_unstructure_fn( Campaign, converter, _cattrs_include_init_false=True @@ -986,9 +1057,37 @@ def _drop_version(dict_: dict) -> dict: converter.register_unstructure_hook( Campaign, lambda x: _add_version(unstructure_hook(x)) ) -converter.register_structure_hook( - Campaign, lambda d, cl: structure_hook(_drop_version(d), cl) -) + + +def _structure_campaign(d: dict, cl: type) -> Campaign: + """Structure a Campaign from a dictionary, handling legacy migrations.""" + prepared = _prepare_for_structuring(d) + legacy_recommended_idxs = prepared.pop("_legacy_recommended_idxs", None) + legacy_excluded_idxs = prepared.pop("_legacy_excluded_idxs", None) + campaign = structure_hook(prepared, cl) + + # >>>>>>>>>> Deprecation + # Post-structure reconstruction from legacy metadata indices + if legacy_recommended_idxs is not None or legacy_excluded_idxs is not None: + candidates = campaign.searchspace.discrete.get_candidates() + if legacy_recommended_idxs is not None: + campaign._recommended_experiments = candidates.loc[ + legacy_recommended_idxs + ].reset_index(drop=True) + if legacy_excluded_idxs is not None: + campaign._excluded_experiments = candidates.loc[ + legacy_excluded_idxs + ].reset_index(drop=True) + + # Fix schema of empty DataFrames from legacy serialization + if campaign._measurements.columns.empty: + campaign._measurements = campaign._default_measurements() + # <<<<<<<<<< Deprecation + + return campaign + + +converter.register_structure_hook(Campaign, _structure_campaign) # Converter for config validation diff --git a/baybe/constraints/validation.py b/baybe/constraints/validation.py index c0afa3b6fb..92c4428d4a 100644 --- a/baybe/constraints/validation.py +++ b/baybe/constraints/validation.py @@ -43,7 +43,7 @@ def validate_constraints( # noqa: DOC101, DOC103 f"Please specify all dependencies in one single constraint." ) - validate_cardinality_constraints_are_nonoverlapping( + _validate_cardinality_constraints_are_nonoverlapping( [con for con in constraints if isinstance(con, ContinuousCardinalityConstraint)] ) @@ -103,12 +103,12 @@ def validate_constraints( # noqa: DOC101, DOC103 ) if isinstance(constraint, ContinuousCardinalityConstraint): - validate_cardinality_constraint_parameter_bounds( + _validate_cardinality_constraint_parameter_bounds( constraint, params_continuous ) -def validate_cardinality_constraints_are_nonoverlapping( +def _validate_cardinality_constraints_are_nonoverlapping( constraints: Collection[ContinuousCardinalityConstraint], ) -> None: """Validate that cardinality constraints are non-overlapping. @@ -129,7 +129,7 @@ def validate_cardinality_constraints_are_nonoverlapping( ) -def validate_cardinality_constraint_parameter_bounds( +def _validate_cardinality_constraint_parameter_bounds( constraint: ContinuousCardinalityConstraint, parameters: Collection[NumericalContinuousParameter], ) -> None: diff --git a/baybe/exceptions.py b/baybe/exceptions.py index 0be2273341..c049d414b8 100644 --- a/baybe/exceptions.py +++ b/baybe/exceptions.py @@ -179,5 +179,9 @@ class UnsupportedEarlyFilteringError(Exception): """A constraint does not support early filtering with the given parameters.""" +class InfiniteSpaceError(Exception): + """An operation requires a finite search space but the space is infinite.""" + + # Collect leftover original slotted classes processed by `attrs.define` gc.collect() diff --git a/baybe/parameters/base.py b/baybe/parameters/base.py index 8fbd489a41..a615a673bf 100644 --- a/baybe/parameters/base.py +++ b/baybe/parameters/base.py @@ -124,6 +124,12 @@ class DiscreteParameter(Parameter, ABC): def values(self) -> tuple: """The values the parameter can take.""" + @property + def is_finite(self) -> bool: + """Indicates whether the parameter has a finite number of values.""" + len(self.values) # <-- raises an error if the parameter is infinite + return True + @property def active_values(self) -> tuple: """The values that are considered for recommendation.""" diff --git a/baybe/recommenders/naive.py b/baybe/recommenders/naive.py index 5b602d881b..4b89ad13da 100644 --- a/baybe/recommenders/naive.py +++ b/baybe/recommenders/naive.py @@ -92,9 +92,6 @@ def recommend( cont_part = searchspace.continuous.sample_uniform(1) cont_part_tensor = to_tensor(cont_part).unsqueeze(-2) - # Get discrete candidates - candidates_exp, _ = searchspace.discrete.get_candidates() - # We now check whether the discrete recommender is bayesian. if isinstance(self.disc_recommender, BayesianRecommender): # Get access to the recommenders acquisition function @@ -112,16 +109,15 @@ def recommend( self.disc_recommender._botorch_acqf = disc_acqf_part - # Call the private function of the discrete recommender and get the indices - disc_rec_idx = self.disc_recommender._recommend_discrete( + # Call the private function of the discrete recommender and get the candidates + disc_rec = self.disc_recommender._recommend_discrete( subspace_discrete=searchspace.discrete, - candidates_exp=candidates_exp, batch_size=batch_size, ) # Get one random discrete point that will be attached when evaluating the # acquisition function in the discrete space. - disc_part = searchspace.discrete.comp_rep.loc[disc_rec_idx].sample(1) + disc_part = searchspace.discrete.transform(disc_rec).sample(1) disc_part_tensor = to_tensor(disc_part).unsqueeze(-2) # Setup a fresh acquisition function for the continuous recommender @@ -143,9 +139,8 @@ def recommend( ) # Glue the solutions together and return them - rec_disc_exp = searchspace.discrete.exp_rep.loc[disc_rec_idx] - rec_cont.index = rec_disc_exp.index - rec_exp = pd.concat([rec_disc_exp, rec_cont], axis=1) + rec_cont.index = disc_rec.index + rec_exp = pd.concat([disc_rec, rec_cont], axis=1) return rec_exp diff --git a/baybe/recommenders/pure/base.py b/baybe/recommenders/pure/base.py index 4c61a4ea4e..58d62c1868 100644 --- a/baybe/recommenders/pure/base.py +++ b/baybe/recommenders/pure/base.py @@ -133,42 +133,34 @@ def recommend( subspace_continuous=searchspace.continuous, batch_size=batch_size ) else: - return self._recommend_with_discrete_parts( - searchspace, - batch_size, - pending_experiments=pending_experiments, - ) + return self._recommend_with_discrete_parts(searchspace, batch_size) def _recommend_discrete( self, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, - ) -> pd.Index: + ) -> pd.DataFrame: """Generate recommendations from a discrete search space. Args: subspace_discrete: The discrete subspace from which to generate recommendations. - candidates_exp: The experimental representation of all discrete candidate - points to be considered. batch_size: The size of the recommendation batch. Raises: NotImplementedError: If the function is not implemented by the child class. Returns: - The dataframe indices of the recommended points in the provided - experimental representation. + A dataframe containing the recommendations as a subset of rows from the + provided experimental representation. """ # If this method is not implemented by a child class, try to resort to hybrid # recommendation (with an empty subspace) instead. try: return self._recommend_hybrid( searchspace=SearchSpace(discrete=subspace_discrete), - candidates_exp=candidates_exp, batch_size=batch_size, - ).index + ) except NotImplementedError as exc: raise NotImplementedError( """Hybrid recommendation could not be used as fallback when trying to @@ -202,7 +194,6 @@ def _recommend_continuous( try: return self._recommend_hybrid( searchspace=SearchSpace(continuous=subspace_continuous), - candidates_exp=pd.DataFrame(), batch_size=batch_size, ) except NotImplementedError as exc: @@ -218,7 +209,6 @@ def _recommend_continuous( def _recommend_hybrid( self, searchspace: SearchSpace, - candidates_exp: pd.DataFrame, batch_size: int, ) -> pd.DataFrame: """Generate recommendations from a hybrid search space. @@ -230,8 +220,6 @@ def _recommend_hybrid( Args: searchspace: The hybrid search space from which to generate recommendations. - candidates_exp: The experimental representation of all discrete candidate - points to be considered. batch_size: The size of the recommendation batch. Raises: @@ -246,7 +234,6 @@ def _recommend_with_discrete_parts( self, searchspace: SearchSpace, batch_size: int, - pending_experiments: pd.DataFrame | None, ) -> pd.DataFrame: """Obtain recommendations in search spaces with a discrete part. @@ -256,7 +243,6 @@ def _recommend_with_discrete_parts( Args: searchspace: The search space from which to generate recommendations. batch_size: The size of the recommendation batch. - pending_experiments: Pending experiments in experimental representation. Returns: A dataframe containing the recommendations as individual rows. @@ -273,7 +259,7 @@ def _recommend_with_discrete_parts( and not self.supports_discrete_subset_generating_constraints ): constraint_types = { - type(c).__name__ for c in searchspace.discrete.constraints_batch + type(c).__name__ for c in searchspace.discrete.batch_constraints } raise IncompatibilityError( f"'{self.__class__.__name__}' does not support discrete " @@ -281,14 +267,13 @@ def _recommend_with_discrete_parts( f"{constraint_types}." ) - # Get discrete candidates - candidates_exp, _ = searchspace.discrete.get_candidates() - # TODO: Introduce new flag to recommend batches larger than the search space # Check if enough candidates are left # TODO [15917]: This check is not perfectly correct. - if (not is_hybrid_space) and (len(candidates_exp) < batch_size): + if (not is_hybrid_space) and ( + len(searchspace.discrete.get_candidates()) < batch_size + ): raise NotEnoughPointsLeftError( f"Using the current settings, there are fewer than {batch_size} " f"possible data points left to recommend." @@ -296,12 +281,9 @@ def _recommend_with_discrete_parts( # Get recommendations if is_hybrid_space: - rec = self._recommend_hybrid(searchspace, candidates_exp, batch_size) + rec = self._recommend_hybrid(searchspace, batch_size) else: - idxs = self._recommend_discrete( - searchspace.discrete, candidates_exp, batch_size - ) - rec = searchspace.discrete.exp_rep.loc[idxs, :] + rec = self._recommend_discrete(searchspace.discrete, batch_size) # Return recommendations return rec diff --git a/baybe/recommenders/pure/bayesian/botorch/core.py b/baybe/recommenders/pure/bayesian/botorch/core.py index 7953d5ca74..b420e3b88b 100644 --- a/baybe/recommenders/pure/bayesian/botorch/core.py +++ b/baybe/recommenders/pure/bayesian/botorch/core.py @@ -156,9 +156,8 @@ def __str__(self) -> str: def _recommend_discrete( self, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, - ) -> pd.Index: + ) -> pd.DataFrame: """Generate recommendations from a discrete search space. Dispatches to the appropriate optimization routine depending on whether @@ -167,21 +166,15 @@ def _recommend_discrete( Args: subspace_discrete: The discrete subspace from which to generate recommendations. - candidates_exp: The experimental representation of all discrete candidate - points to be considered. batch_size: The size of the recommendation batch. Returns: - The dataframe indices of the recommended points in the provided - experimental representation. + A dataframe containing the recommendations as a subset of rows from the + provided experimental representation. """ if subspace_discrete.n_subsets > 0: - return recommend_discrete_with_subsets( - self, subspace_discrete, candidates_exp, batch_size - ) - return recommend_discrete_without_subsets( - self, subspace_discrete, candidates_exp, batch_size - ) + return recommend_discrete_with_subsets(self, subspace_discrete, batch_size) + return recommend_discrete_without_subsets(self, subspace_discrete, batch_size) @override def _recommend_continuous( @@ -221,7 +214,6 @@ def _recommend_continuous( def _recommend_hybrid( self, searchspace: SearchSpace, - candidates_exp: pd.DataFrame, batch_size: int, ) -> pd.DataFrame: """Generate recommendations from a hybrid search space. @@ -231,20 +223,14 @@ def _recommend_hybrid( Args: searchspace: The search space in which the recommendations should be made. - candidates_exp: The experimental representation of the candidates - of the discrete subspace. batch_size: The size of the calculated batch. Returns: The recommended points. """ if searchspace.n_subsets > 0: - return recommend_hybrid_with_subsets( - self, searchspace, candidates_exp, batch_size - ) - return recommend_hybrid_without_subsets( - self, searchspace, candidates_exp, batch_size - ) + return recommend_hybrid_with_subsets(self, searchspace, batch_size) + return recommend_hybrid_without_subsets(self, searchspace, batch_size) def _optimize_over_subsets( self, diff --git a/baybe/recommenders/pure/bayesian/botorch/discrete.py b/baybe/recommenders/pure/bayesian/botorch/discrete.py index a5f92d04d0..eabbc2ad6f 100644 --- a/baybe/recommenders/pure/bayesian/botorch/discrete.py +++ b/baybe/recommenders/pure/bayesian/botorch/discrete.py @@ -8,6 +8,7 @@ import numpy as np import numpy.typing as npt import pandas as pd +from attrs import evolve from baybe.searchspace import SubspaceDiscrete from baybe.utils.dataframe import to_tensor @@ -21,9 +22,8 @@ def recommend_discrete_with_subsets( recommender: BotorchRecommender, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, -) -> pd.Index: +) -> pd.DataFrame: """Recommend from a discrete space with subset-generating constraints. Splits the candidate set into subsets according to subset-generating constraints, @@ -35,60 +35,57 @@ def recommend_discrete_with_subsets( recommender: The recommender instance. subspace_discrete: The discrete subspace from which to generate recommendations. - candidates_exp: The experimental representation of candidates. batch_size: The size of the recommendation batch. Returns: - The dataframe indices of the recommended points. + A dataframe containing the recommendations as a subset of rows from the + provided experimental representation. """ import torch + candidates = subspace_discrete.get_candidates() masks: Iterable[npt.NDArray[np.bool_]] if subspace_discrete.n_subsets <= recommender.max_n_subsets: - masks = subspace_discrete.subset_masks( - candidates_exp, min_candidates=batch_size - ) + masks = subspace_discrete.subset_masks(min_candidates=batch_size) else: masks = subspace_discrete.sample_subset_masks( - candidates_exp, recommender.max_n_subsets, min_candidates=batch_size + recommender.max_n_subsets, + min_candidates=batch_size, ) def make_callable( mask: np.ndarray, - ) -> Callable[[], tuple[pd.Index, Tensor]]: - def optimize() -> tuple[pd.Index, Tensor]: - subset = candidates_exp.loc[mask] + ) -> Callable[[], tuple[pd.DataFrame, Tensor]]: + def optimize() -> tuple[pd.DataFrame, Tensor]: + subset_subspace = evolve(subspace_discrete, exp_rep=candidates.loc[mask]) - idxs = recommend_discrete_without_subsets( - recommender, subspace_discrete, subset, batch_size + rec = recommend_discrete_without_subsets( + recommender, subset_subspace, batch_size ) - comp = subspace_discrete.transform(candidates_exp.loc[idxs]) + comp = subspace_discrete.transform(rec) with torch.no_grad(): acqf_value = recommender._botorch_acqf(to_tensor(comp).unsqueeze(0)) - return idxs, acqf_value + return rec, acqf_value return optimize callables = (make_callable(m) for m in masks) - best_idxs, _ = recommender._optimize_over_subsets(callables) - return best_idxs + best_rec, _ = recommender._optimize_over_subsets(callables) + return best_rec def recommend_discrete_without_subsets( recommender: BotorchRecommender, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, -) -> pd.Index: +) -> pd.DataFrame: """Generate recommendations from a discrete search space. Args: recommender: The recommender instance. subspace_discrete: The discrete subspace from which to generate recommendations. - candidates_exp: The experimental representation of all discrete candidate - points to be considered. batch_size: The size of the recommendation batch. Raises: @@ -96,8 +93,8 @@ def recommend_discrete_without_subsets( function is used with a batch size > 1. Returns: - The dataframe indices of the recommended points in the provided - experimental representation. + A dataframe containing the recommendations as a subset of rows from the + provided experimental representation. """ from baybe.acquisition.acqfs import qThompsonSampling from baybe.exceptions import ( @@ -119,13 +116,14 @@ def recommend_discrete_without_subsets( from botorch.optim import optimize_acqf_discrete - # determine the next set of points to be tested - candidates_comp = subspace_discrete.transform(candidates_exp) + # Determine the next set of points to be tested + candidates = subspace_discrete.get_candidates() + candidates_comp = subspace_discrete.transform(candidates) points, _ = optimize_acqf_discrete( recommender._botorch_acqf, batch_size, to_tensor(candidates_comp) ) - # retrieve the index of the points from the input dataframe + # Retrieve the rows from the subspace corresponding to the selected points # IMPROVE: The merging procedure is conceptually similar to what # `SearchSpace._match_measurement_with_searchspace_indices` does, though using # a simpler matching logic. When refactoring the SearchSpace class to @@ -139,4 +137,4 @@ def recommend_discrete_without_subsets( )["index"] ) - return idxs + return candidates.loc[idxs] diff --git a/baybe/recommenders/pure/bayesian/botorch/hybrid.py b/baybe/recommenders/pure/bayesian/botorch/hybrid.py index 0b017a92dd..1424db9c18 100644 --- a/baybe/recommenders/pure/bayesian/botorch/hybrid.py +++ b/baybe/recommenders/pure/bayesian/botorch/hybrid.py @@ -9,6 +9,7 @@ import numpy as np import pandas as pd +from attrs import evolve from baybe.constraints.utils import is_cardinality_fulfilled from baybe.exceptions import ( @@ -30,7 +31,6 @@ def recommend_hybrid_without_subsets( recommender: BotorchRecommender, searchspace: SearchSpace, - candidates_exp: pd.DataFrame, batch_size: int, ) -> pd.DataFrame: """Recommend points using the ``optimize_acqf_mixed`` function of BoTorch. @@ -49,8 +49,6 @@ def recommend_hybrid_without_subsets( Args: recommender: The recommender instance. searchspace: The search space in which the recommendations should be made. - candidates_exp: The experimental representation of the candidates - of the discrete subspace. batch_size: The size of the calculated batch. Raises: @@ -83,7 +81,8 @@ def recommend_hybrid_without_subsets( from botorch.optim import optimize_acqf_mixed # Transform discrete candidates - candidates_comp = searchspace.discrete.transform(candidates_exp) + candidates = searchspace.discrete.get_candidates() + candidates_comp = searchspace.discrete.transform(candidates) # Calculate the number of samples from the given percentage n_candidates = math.ceil( @@ -145,7 +144,7 @@ def recommend_hybrid_without_subsets( ).set_index("index") # Get experimental representation of discrete part - rec_disc_exp = searchspace.discrete.exp_rep.loc[merged.index] + rec_disc_exp = candidates.loc[merged.index] # Combine discrete and continuous parts rec_exp = pd.concat( @@ -164,7 +163,6 @@ def recommend_hybrid_without_subsets( def recommend_hybrid_with_subsets( recommender: BotorchRecommender, searchspace: SearchSpace, - candidates_exp: pd.DataFrame, batch_size: int, ) -> pd.DataFrame: """Recommend from a hybrid space with subset constraints. @@ -177,28 +175,23 @@ def recommend_hybrid_with_subsets( Args: recommender: The recommender instance. searchspace: The search space in which the recommendations should be made. - candidates_exp: The experimental representation of the candidates - of the discrete subspace. batch_size: The size of the calculated batch. Returns: The recommended points. """ - from attrs import evolve - subspace_c = searchspace.continuous # Get combined configurations, capped at max_n_subsets # NOTE: No min_discrete_candidates filtering in hybrid spaces because # optimize_acqf_mixed can produce multiple recommendations from a single # discrete candidate by varying continuous parameters. + candidates = searchspace.discrete.get_candidates() combined_masks: Iterable[tuple[np.ndarray, frozenset[str]]] if searchspace.n_subsets <= recommender.max_n_subsets: - combined_masks = searchspace.subsets(candidates_exp) + combined_masks = searchspace.subsets() else: - combined_masks = searchspace.sample_subsets( - candidates_exp, recommender.max_n_subsets - ) + combined_masks = searchspace.sample_subsets(recommender.max_n_subsets) def make_callable( d_mask: np.ndarray, @@ -207,18 +200,21 @@ def make_callable( def optimize() -> tuple[pd.DataFrame, Tensor]: import torch - subset = candidates_exp.loc[d_mask] - - if c_inactive_params: - mod_cont = subspace_c._enforce_cardinality_constraints( - c_inactive_params - ) - else: - mod_cont = subspace_c - mod_searchspace = evolve(searchspace, continuous=mod_cont) + mod_disc = evolve( + searchspace.discrete, + exp_rep=candidates.loc[d_mask], + ) + mod_cont = ( + subspace_c._enforce_cardinality_constraints(c_inactive_params) + if c_inactive_params + else subspace_c + ) + mod_searchspace = evolve( + searchspace, discrete=mod_disc, continuous=mod_cont + ) rec = recommend_hybrid_without_subsets( - recommender, mod_searchspace, subset, batch_size + recommender, mod_searchspace, batch_size ) comp = mod_searchspace.transform(rec) diff --git a/baybe/recommenders/pure/nonpredictive/clustering.py b/baybe/recommenders/pure/nonpredictive/clustering.py index 29ef090286..7e5db5c16d 100644 --- a/baybe/recommenders/pure/nonpredictive/clustering.py +++ b/baybe/recommenders/pure/nonpredictive/clustering.py @@ -101,18 +101,18 @@ def _make_selection_custom( def _recommend_discrete( self, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, - ) -> pd.Index: + ) -> pd.DataFrame: # Fit scaler on entire search space from sklearn.preprocessing import StandardScaler # TODO [Scaling]: scaling should be handled by search space object + candidates = subspace_discrete.get_candidates() + candidates_comp = subspace_discrete.transform(candidates) scaler = StandardScaler() - scaler.fit(subspace_discrete.comp_rep) + scaler.fit(candidates_comp) # Scale candidates - candidates_comp = subspace_discrete.transform(candidates_exp) candidates_scaled = np.ascontiguousarray(scaler.transform(candidates_comp)) # Set model parameters and perform fit @@ -128,8 +128,8 @@ def _recommend_discrete( else: selection = self._make_selection_default(model, candidates_scaled) - # Convert positional indices into DataFrame indices and return result - return candidates_comp.index[selection] + # Select rows by positional indices and return the corresponding subset + return candidates.iloc[selection] @override def __str__(self) -> str: diff --git a/baybe/recommenders/pure/nonpredictive/sampling.py b/baybe/recommenders/pure/nonpredictive/sampling.py index 0770b59d4f..bc17a56a00 100644 --- a/baybe/recommenders/pure/nonpredictive/sampling.py +++ b/baybe/recommenders/pure/nonpredictive/sampling.py @@ -31,7 +31,6 @@ class RandomRecommender(NonPredictiveRecommender): def _recommend_hybrid( self, searchspace: SearchSpace, - candidates_exp: pd.DataFrame, batch_size: int, ) -> pd.DataFrame: is_hybrid = searchspace.type is SearchSpaceType.HYBRID @@ -42,10 +41,11 @@ def _recommend_hybrid( if searchspace.type is SearchSpaceType.CONTINUOUS: return cont_random + candidates_exp = searchspace.discrete.get_candidates() + # Restrict to a random subset if subset-generating constraints are present if searchspace.discrete.n_subsets > 0: masks = searchspace.discrete.sample_subset_masks( - candidates_exp, n=1, min_candidates=None if is_hybrid else batch_size, ) @@ -146,18 +146,18 @@ def _validate_random_tie_break(self, _, value): def _recommend_discrete( self, subspace_discrete: SubspaceDiscrete, - candidates_exp: pd.DataFrame, batch_size: int, - ) -> pd.Index: + ) -> pd.DataFrame: # Fit scaler on entire search space from sklearn.preprocessing import StandardScaler # TODO [Scaling]: scaling should be handled by search space object + candidates = subspace_discrete.get_candidates() + candidates_comp = subspace_discrete.transform(candidates) scaler = StandardScaler() - scaler.fit(subspace_discrete.comp_rep) + scaler.fit(candidates_comp) # Scale and sample - candidates_comp = subspace_discrete.transform(candidates_exp) candidates_scaled = np.ascontiguousarray(scaler.transform(candidates_comp)) if active_settings.use_fpsample: @@ -174,7 +174,7 @@ def _recommend_discrete( initialization=self.initialization.value, random_tie_break=self.random_tie_break, ) - return candidates_comp.index[ilocs] + return candidates.iloc[ilocs] @override def __str__(self) -> str: diff --git a/baybe/searchspace/__init__.py b/baybe/searchspace/__init__.py index d78f7fafee..42d39d9493 100644 --- a/baybe/searchspace/__init__.py +++ b/baybe/searchspace/__init__.py @@ -1,5 +1,10 @@ """BayBE search spaces.""" +from baybe.searchspace.candidates import ( + CandidatesProtocol, + ProductCandidates, + TableCandidates, +) from baybe.searchspace.continuous import SubspaceContinuous from baybe.searchspace.core import ( SearchSpace, @@ -9,9 +14,15 @@ from baybe.searchspace.discrete import SubspaceDiscrete __all__ = [ + # Search space "validate_searchspace_from_config", "SearchSpace", "SearchSpaceType", + # Discrete + "CandidatesProtocol", + "ProductCandidates", + "TableCandidates", "SubspaceDiscrete", + # Continuous "SubspaceContinuous", ] diff --git a/baybe/searchspace/_filtered.py b/baybe/searchspace/_filtered.py deleted file mode 100644 index 322837c0c5..0000000000 --- a/baybe/searchspace/_filtered.py +++ /dev/null @@ -1,43 +0,0 @@ -"""Search spaces with metadata.""" - -import numpy as np -import numpy.typing as npt -import pandas as pd -from attrs import asdict, cmp_using, define, field -from attrs.validators import instance_of -from typing_extensions import Self, override - -from baybe.searchspace import SubspaceDiscrete - - -@define -class FilteredSubspaceDiscrete(SubspaceDiscrete): - """A filtered search space representing a reduced candidate set.""" - - mask_keep: npt.NDArray[np.bool_] = field( - validator=instance_of(np.ndarray), - kw_only=True, - eq=cmp_using(eq=np.array_equal), - ) - """The filtering mask. ``True`` marks elements to be kept.""" - - @mask_keep.validator - def _validate_mask_keep(self, _, value) -> None: - if not len(value) == len(self.exp_rep): - raise ValueError("Filter mask must match the size of the space.") - if not value.dtype == bool: - raise ValueError("Filter mask must only contain Boolean values.") - - @classmethod - def from_subspace( - cls, subspace: SubspaceDiscrete, mask_keep: npt.NDArray[np.bool_] - ) -> Self: - """Filter an existing subspace.""" - return cls( - **asdict(subspace, filter=lambda attr, _: attr.init, recurse=False), - mask_keep=mask_keep, - ) - - @override - def get_candidates(self) -> tuple[pd.DataFrame, pd.DataFrame]: - return self.exp_rep.loc[self.mask_keep], self.comp_rep.loc[self.mask_keep] diff --git a/baybe/searchspace/candidates.py b/baybe/searchspace/candidates.py new file mode 100644 index 0000000000..2fd02eb148 --- /dev/null +++ b/baybe/searchspace/candidates.py @@ -0,0 +1,127 @@ +"""Candidates module for managing lazy candidate generation.""" + +import gc +from typing import Protocol + +import narwhals.stable.v2 as nw +from attr.validators import deep_iterable, instance_of, min_len +from attrs import Attribute, define, field +from typing_extensions import override + +from baybe.constraints import DISCRETE_CONSTRAINTS_FILTERING_ORDER, validate_constraints +from baybe.constraints.base import DiscreteConstraint +from baybe.exceptions import InfiniteSpaceError +from baybe.parameters.base import DiscreteParameter +from baybe.parameters.utils import sort_parameters +from baybe.searchspace.utils import build_constrained_product +from baybe.searchspace.validation import validate_parameters +from baybe.utils.basic import to_tuple +from baybe.utils.dataframe import to_lazy +from baybe.utils.validation import validate_parameter_input + + +class CandidatesProtocol(Protocol): + """Type protocol specifying the interface candidate generators need to implement.""" + + # Use slots so that derived classes also remain slotted + # See also: https://www.attrs.org/en/stable/glossary.html#term-slotted-classes + __slots__ = () + + @property + def parameters(self) -> tuple[DiscreteParameter, ...]: + """The parameters spanning the space from which candidates are generated.""" + + @property + def is_finite(self) -> bool: + """Indicates whether the candidate set is finite or infinite.""" + + def to_lazy(self) -> nw.LazyFrame: + """Generate all candidates.""" + + +@define(frozen=True) +class ProductCandidates(CandidatesProtocol): + """Class for managing candidates from (filtered) Cartesian product spaces.""" + + parameters: tuple[DiscreteParameter, ...] = field( + converter=sort_parameters, + validator=[ + min_len(1), + deep_iterable(member_validator=instance_of(DiscreteParameter)), + lambda _, __, x: validate_parameters(x), + ], + ) + """See :attr:`CandidatesProtocol.parameters`.""" + + constraints: tuple[DiscreteConstraint, ...] = field( + default=(), + converter=lambda x: to_tuple( + sorted( + x, key=lambda c: DISCRETE_CONSTRAINTS_FILTERING_ORDER.index(c.__class__) + ) + ), + validator=deep_iterable(member_validator=instance_of(DiscreteConstraint)), + ) + """Constraints to filter the Cartesian product of parameter values.""" + + @constraints.validator + def _validate_constraints( + self, _: Attribute, value: tuple[DiscreteConstraint, ...] + ): # noqa: DOC101, DOC103 + validate_constraints(value, self.parameters) + + @override + @property + def is_finite(self) -> bool: + return all(p.is_finite for p in self.parameters) + + @override + def to_lazy(self) -> nw.LazyFrame: + if not self.is_finite: + raise InfiniteSpaceError( + "Cannot generate all candidates from an infinite space." + ) + + candidates_df = build_constrained_product(self.parameters, self.constraints) + + # TODO: Remove to lazy once build_constrained_product returns a nw.LazyFrame + assert not isinstance(candidates_df, nw.LazyFrame) + return to_lazy(candidates_df) + + +@define(frozen=True) +class TableCandidates(CandidatesProtocol): + """Class for managing candidates provided in a tabular format.""" + + parameters: tuple[DiscreteParameter, ...] = field( + converter=sort_parameters, + validator=[ + min_len(1), + deep_iterable(member_validator=instance_of(DiscreteParameter)), + lambda _, __, x: validate_parameters(x), + ], + ) + """See :attr:`CandidatesProtocol.parameters`.""" + + dataframe: nw.LazyFrame = field(converter=to_lazy) + """The dataframe containing the candidates.""" + + @dataframe.validator + def _validate_dataframe(self, _: Attribute, value: nw.LazyFrame) -> None: # noqa: DOC101, DOC103 + # TODO: Remove collect().to_pandas() once validation on lazy frames is supported + validate_parameter_input( + value.collect().to_pandas(), self.parameters, allow_extra=False + ) + + @override + @property + def is_finite(self) -> bool: + return True + + @override + def to_lazy(self) -> nw.LazyFrame: + return self.dataframe + + +# Collect leftover original slotted classes processed by `attrs.define` +gc.collect() diff --git a/baybe/searchspace/continuous.py b/baybe/searchspace/continuous.py index 43db04b2d6..a4d65f4e53 100644 --- a/baybe/searchspace/continuous.py +++ b/baybe/searchspace/continuous.py @@ -5,25 +5,24 @@ import gc import math import random +import warnings from collections.abc import Collection, Iterator, Sequence from itertools import chain from typing import TYPE_CHECKING, Any, Literal, cast +import cattrs.gen import numpy as np import pandas as pd from attrs import define, evolve, field, fields +from attrs.validators import deep_iterable, instance_of from typing_extensions import override from baybe.constraints import ( ContinuousCardinalityConstraint, ContinuousLinearConstraint, -) -from baybe.constraints.base import ContinuousConstraint, ContinuousNonlinearConstraint -from baybe.constraints.validation import ( - validate_cardinality_constraint_parameter_bounds, - validate_cardinality_constraints_are_nonoverlapping, validate_constraints, ) +from baybe.constraints.base import ContinuousConstraint, ContinuousNonlinearConstraint from baybe.parameters import NumericalContinuousParameter from baybe.parameters.base import ContinuousParameter from baybe.parameters.numerical import _FixedNumericalContinuousParameter @@ -33,12 +32,10 @@ sort_parameters, ) from baybe.searchspace.utils import select_via_flat_index -from baybe.searchspace.validation import ( - validate_parameter_names, -) +from baybe.searchspace.validation import validate_parameters from baybe.serialization import SerialMixin, converter, select_constructor_hook from baybe.settings import active_settings -from baybe.utils.basic import flatten, to_tuple +from baybe.utils.basic import flatten, is_all_instance, to_tuple from baybe.utils.conversion import to_string from baybe.utils.dataframe import get_transform_objects, pretty_print_df @@ -50,35 +47,64 @@ _MAX_CARDINALITY_SAMPLING_ATTEMPTS = 10_000 -@define +@define(init=False) class SubspaceContinuous(SerialMixin): """Class for managing continuous subspaces. - Builds the subspace from parameter definitions, keeps - track of search metadata, and provides access to candidate sets and different - parameter views. + Builds the subspace from parameter definitions and optional constraints, + and provides access to candidate sets and different parameter views. """ parameters: tuple[NumericalContinuousParameter, ...] = field( converter=sort_parameters, - validator=lambda _, __, x: validate_parameter_names(x), + validator=[ + deep_iterable(member_validator=instance_of(ContinuousParameter)), + lambda _, __, x: validate_parameters(x, allow_empty=True), + ], ) - """The parameters of the subspace.""" - - constraints_lin_eq: tuple[ContinuousLinearConstraint, ...] = field( - converter=to_tuple, factory=tuple + """The parameters spanning the subspace.""" + + constraints: tuple[ContinuousConstraint, ...] = field( + default=(), + converter=to_tuple, + validator=[ + deep_iterable(member_validator=instance_of(ContinuousConstraint)), + ], ) - """Linear equality constraints.""" + """Optional constraints filtering the subspace.""" - constraints_lin_ineq: tuple[ContinuousLinearConstraint, ...] = field( - converter=to_tuple, factory=tuple - ) - """Linear inequality constraints.""" + def __init__( + self, + parameters: Sequence[ContinuousParameter], + constraints: Sequence[ContinuousConstraint] = (), + constraints_lin_eq: Sequence[ContinuousLinearConstraint] = (), + constraints_lin_ineq: Sequence[ContinuousLinearConstraint] = (), + constraints_nonlin: Sequence[ContinuousNonlinearConstraint] = (), + ): + constraints = list(constraints) + n_constraints = len(constraints) + if constraints_lin_eq is not None: + constraints.extend(constraints_lin_eq) + if constraints_lin_ineq is not None: + constraints.extend(constraints_lin_ineq) + if constraints_nonlin is not None: + constraints.extend(constraints_nonlin) + + if len(constraints) != n_constraints: + name = fields(SubspaceContinuous).constraints.name + warnings.warn( + f"You are using the deprecated 'constraints_lin_eq', " + f"'constraints_lin_ineq' and/or 'constraints_nonlin' arguments to " + f"specify constraints. For backward compatibility, we have " + f"automatically merged their content into the '{name}' attribute. " + f"However, please update your code to directly use the '{name}' " + f"argument instead since the deprecated arguments will be removed in " + f"a future version.", + DeprecationWarning, + stacklevel=2, + ) - constraints_nonlin: tuple[ContinuousNonlinearConstraint, ...] = field( - converter=to_tuple, factory=tuple - ) - """Nonlinear constraints.""" + self.__attrs_init__(parameters, constraints) @override def __str__(self) -> str: @@ -110,41 +136,45 @@ def __str__(self) -> str: return to_string(self.__class__.__name__, *fields) + @constraints.validator + def _validate_constraints(self, _, __) -> None: + """Validate constraints.""" + validate_constraints(self.constraints, self.parameters) + + @property + def constraints_lin_eq(self) -> tuple[ContinuousLinearConstraint, ...]: + """Linear equality constraints.""" + return tuple( + c + for c in self.constraints + if isinstance(c, ContinuousLinearConstraint) and c.is_eq + ) + + @property + def constraints_lin_ineq(self) -> tuple[ContinuousLinearConstraint, ...]: + """Linear inequality constraints.""" + return tuple( + c + for c in self.constraints + if isinstance(c, ContinuousLinearConstraint) and not c.is_eq + ) + + @property + def constraints_nonlin(self) -> tuple[ContinuousNonlinearConstraint, ...]: + """Nonlinear constraints.""" + return tuple( + c for c in self.constraints if isinstance(c, ContinuousNonlinearConstraint) + ) + @property def constraints_cardinality(self) -> tuple[ContinuousCardinalityConstraint, ...]: """The cardinality constraints of the subspace.""" return tuple( c - for c in self.constraints_nonlin + for c in self.constraints if isinstance(c, ContinuousCardinalityConstraint) ) - @constraints_lin_eq.validator - def _validate_constraints_lin_eq( - self, _, lst: list[ContinuousLinearConstraint] - ) -> None: - """Validate linear equality constraints.""" - # TODO Remove once eq and ineq constraints are consolidated into one list - if not all(c.is_eq for c in lst): - raise ValueError( - f"The list '{fields(self.__class__).constraints_lin_eq.name}' of " - f"{self.__class__.__name__} only accepts equality constraints, i.e. " - f"the 'operator' for all list items should be '='." - ) - - @constraints_lin_ineq.validator - def _validate_constraints_lin_ineq( - self, _, lst: list[ContinuousLinearConstraint] - ) -> None: - """Validate linear inequality constraints.""" - # TODO Remove once eq and ineq constraints are consolidated into one list - if any(c.is_eq for c in lst): - raise ValueError( - f"The list '{fields(self.__class__).constraints_lin_ineq.name}' of " - f"{self.__class__.__name__} only accepts inequality constraints, i.e. " - f"the 'operator' for all list items should be '>=' or '<='." - ) - @property def n_subsets(self) -> int: """The number of possible subset configurations. @@ -205,17 +235,6 @@ def inactive_parameter_combinations( for flat_idx in order: yield frozenset(chain(*select_via_flat_index(flat_idx, per_constraint))) - @constraints_nonlin.validator - def _validate_constraints_nonlin(self, _, __) -> None: - """Validate nonlinear constraints.""" - # Note: The passed constraints are accessed indirectly through the property - validate_cardinality_constraints_are_nonoverlapping( - self.constraints_cardinality - ) - - for con in self.constraints_cardinality: - validate_cardinality_constraint_parameter_bounds(con, self.parameters) - def to_searchspace(self) -> SearchSpace: """Turn the subspace into a search space with no discrete part.""" from baybe.searchspace.core import SearchSpace @@ -247,26 +266,9 @@ def from_product( ) -> SubspaceContinuous: """See :class:`baybe.searchspace.core.SearchSpace`.""" constraints = constraints or [] - if constraints: validate_constraints(constraints, parameters) - - return SubspaceContinuous( - parameters=[p for p in parameters if p.is_continuous], - constraints_lin_eq=[ - c - for c in constraints - if (isinstance(c, ContinuousLinearConstraint) and c.is_eq) - ], - constraints_lin_ineq=[ - c - for c in constraints - if (isinstance(c, ContinuousLinearConstraint) and not c.is_eq) - ], - constraints_nonlin=[ - c for c in constraints if isinstance(c, ContinuousNonlinearConstraint) - ], - ) + return SubspaceContinuous(parameters, constraints) @classmethod def from_bounds(cls, bounds: pd.DataFrame) -> SubspaceContinuous: @@ -381,32 +383,38 @@ def _drop_parameters(self, parameter_names: Collection[str]) -> SubspaceContinuo Args: parameter_names: The names of the parameter to be removed. + Raises: + NotImplementedError: If the subspace contains constraints that are not + linear intrapoint constraints. + Returns: The reduced subspace. """ - return SubspaceContinuous( - parameters=[p for p in self.parameters if p.name not in parameter_names], - constraints_lin_eq=[ - c._drop_parameters(parameter_names) - for c in self.constraints_lin_eq - if set(c.parameters) - set(parameter_names) - ], - constraints_lin_ineq=[ - c._drop_parameters(parameter_names) - for c in self.constraints_lin_ineq - if set(c.parameters) - set(parameter_names) - ], - ) + # Filter constraints that involve the parameters being dropped + affected_constraints = [ + c for c in self.constraints if set(c.parameters) & set(parameter_names) + ] - @property - def is_constrained(self) -> bool: - """Boolean indicating if the subspace is constrained in any way.""" - return any( - ( - self.constraints_lin_eq, - self.constraints_lin_ineq, - self.constraints_nonlin, + # Check if all affected constraints are linear intrapoint constraints + if not is_all_instance(affected_constraints, ContinuousLinearConstraint) or any( + c.is_interpoint for c in affected_constraints + ): + raise NotImplementedError( + "Dropping parameters is only supported for subspaces without " + "constraints or with linear intrapoint constraints." ) + + unaffected_constraints = [ + c for c in self.constraints if c not in affected_constraints + ] + reduced_constraints = [ + c._drop_parameters(parameter_names) + for c in affected_constraints + if (set(c.parameters) - set(parameter_names)) + ] + return SubspaceContinuous( + parameters=[p for p in self.parameters if p.name not in parameter_names], + constraints=[*unaffected_constraints, *reduced_constraints], ) @property @@ -474,9 +482,9 @@ def _enforce_cardinality_constraints( return evolve( self, parameters=adjusted_parameters, - constraints_nonlin=[ + constraints=[ c - for c in self.constraints_nonlin + for c in self.constraints if not isinstance(c, ContinuousCardinalityConstraint) ], ) @@ -523,7 +531,7 @@ def sample_uniform(self, batch_size: int = 1) -> pd.DataFrame: if not self.parameters: return pd.DataFrame(index=pd.RangeIndex(0, batch_size)) - if not self.is_constrained: + if not self.constraints: return self._sample_from_bounds(batch_size, self.comp_rep_bounds.to_numpy()) if len(self.constraints_cardinality) == 0: @@ -725,8 +733,66 @@ def get_parameters_by_name( return tuple(p for p in self.parameters if p.name in names) -# Register deserialization hook -converter.register_structure_hook(SubspaceContinuous, select_constructor_hook) - # Collect leftover original slotted classes processed by `attrs.define` gc.collect() + +# Uncomment when removing the deprecation: +# converter.register_structure_hook(SubspaceContinuous, select_constructor_hook) + +# >>>>> Deprecation +_hook = cattrs.gen.make_dict_structure_fn(SubspaceContinuous, converter) + + +def _structure_hook(specs: dict, cls: type) -> SubspaceContinuous: + """Structure hook that supports both constructor dispatch and legacy fields.""" + if "constructor" in specs: + return select_constructor_hook(specs, cls) + + specs = specs.copy() + specs.pop("type", None) + + # Check if any deprecated constraint fields are present + deprecated_keys = { + "constraints_lin_eq", + "constraints_lin_ineq", + "constraints_nonlin", + } + if deprecated_keys & specs.keys(): + from baybe.constraints.base import ( + ContinuousConstraint, + ContinuousNonlinearConstraint, + ) + + kwargs: dict[str, Any] = {} + if "parameters" in specs: + kwargs["parameters"] = [ + converter.structure(p, NumericalContinuousParameter) + for p in specs["parameters"] + ] + if "constraints" in specs: + kwargs["constraints"] = [ + converter.structure(c, ContinuousConstraint) + for c in specs["constraints"] + ] + if "constraints_lin_eq" in specs: + kwargs["constraints_lin_eq"] = [ + converter.structure(c, ContinuousLinearConstraint) + for c in specs["constraints_lin_eq"] + ] + if "constraints_lin_ineq" in specs: + kwargs["constraints_lin_ineq"] = [ + converter.structure(c, ContinuousLinearConstraint) + for c in specs["constraints_lin_ineq"] + ] + if "constraints_nonlin" in specs: + kwargs["constraints_nonlin"] = [ + converter.structure(c, ContinuousNonlinearConstraint) + for c in specs["constraints_nonlin"] + ] + return SubspaceContinuous(**kwargs) + + return _hook(specs, cls) + + +converter.register_structure_hook(SubspaceContinuous, _structure_hook) +# <<<<< Deprecation diff --git a/baybe/searchspace/core.py b/baybe/searchspace/core.py index d835139fd8..e36959eeea 100644 --- a/baybe/searchspace/core.py +++ b/baybe/searchspace/core.py @@ -6,7 +6,7 @@ from collections.abc import Collection, Iterable, Iterator, Sequence from enum import Enum from itertools import product -from typing import TYPE_CHECKING, ClassVar, cast +from typing import TYPE_CHECKING, ClassVar import numpy as np import numpy.typing as npt @@ -18,7 +18,7 @@ from baybe.constraints.base import Constraint from baybe.exceptions import InfeasibilityError from baybe.parameters import TaskParameter -from baybe.parameters.base import Parameter +from baybe.parameters.base import ContinuousParameter, DiscreteParameter, Parameter from baybe.searchspace.continuous import SubspaceContinuous from baybe.searchspace.discrete import ( MemorySize, @@ -107,7 +107,6 @@ def from_product( cls, parameters: Sequence[Parameter], constraints: Sequence[Constraint] | None = None, - empty_encoding: bool = False, ) -> SearchSpace: """Create a search space from a cartesian product. @@ -121,19 +120,11 @@ def from_product( parameters: The parameters spanning the search space. constraints: An optional set of constraints restricting the valid parameter space. - empty_encoding: If ``True``, uses an "empty" encoding for all parameters. - This is useful, for instance, in combination with random search - strategies that do not read the actual parameter values, since it avoids - the (potentially costly) transformation of the parameter values to their - computational representation. Returns: The constructed search space. + """ - # IMPROVE: The arguments get pre-validated here to avoid the potentially costly - # creation of the subspaces. Perhaps there is an elegant way to bypass the - # default validation in the initializer (which is required for other - # ways of object creation) in this particular case. validate_parameters(parameters) if constraints: validate_constraints(constraints, parameters) @@ -143,7 +134,6 @@ def from_product( discrete = SubspaceDiscrete.from_product( parameters=[p for p in parameters if p.is_discrete], # type:ignore[misc] constraints=[c for c in constraints if c.is_discrete], # type:ignore[misc] - empty_encoding=empty_encoding, ) continuous = SubspaceContinuous.from_product( parameters=[p for p in parameters if p.is_continuous], # type:ignore[misc] @@ -208,17 +198,10 @@ def parameters(self) -> tuple[Parameter, ...]: def constraints(self) -> tuple[Constraint, ...]: """Return the constraints of the search space.""" return ( - *self.discrete.constraints, - *self.continuous.constraints_lin_eq, - *self.continuous.constraints_lin_ineq, - *self.continuous.constraints_nonlin, + *self.discrete.batch_constraints, + *self.continuous.constraints, ) - @property - def is_constrained(self) -> bool: - """Boolean indicating if the search space has any constraints.""" - return self.discrete.is_constrained or self.continuous.is_constrained - @property def type(self) -> SearchSpaceType: """Return the type of the search space.""" @@ -280,7 +263,7 @@ def task_idx(self) -> int | None: # appear first in the computational dataframe. # 3. It assumes there exists exactly one task parameter # --> Fix this when refactoring the data - return cast(int, self.discrete.comp_rep.columns.get_loc(task_param.name)) + return self.discrete.comp_rep_columns.index(task_param.name) @property def n_tasks(self) -> int: @@ -310,7 +293,6 @@ def n_subsets(self) -> int: def subsets( self, - candidates_exp: pd.DataFrame, min_discrete_candidates: int | None = None, ) -> Iterator[tuple[npt.NDArray[np.bool_], frozenset[str]]]: r"""Get an iterator over all combined subset configurations. @@ -319,7 +301,6 @@ def subsets( configurations. Args: - candidates_exp: The experimental representation of discrete candidates. min_discrete_candidates: If provided, discrete Subsets with fewer matching candidates are skipped. @@ -327,15 +308,12 @@ def subsets( A discrete mask and continuous inactive parameters pair. """ yield from product( - self.discrete.subset_masks( - candidates_exp, min_candidates=min_discrete_candidates - ), + self.discrete.subset_masks(min_candidates=min_discrete_candidates), self.continuous.inactive_parameter_combinations(), ) def sample_subsets( self, - candidates_exp: pd.DataFrame, n: int, min_discrete_candidates: int | None = None, *, @@ -348,7 +326,6 @@ def sample_subsets( Duplicate pairs are skipped. Args: - candidates_exp: The experimental representation of discrete candidates. n: Number of unique configurations to sample. min_discrete_candidates: If provided, discrete Subsets with fewer matching candidates are excluded. @@ -363,7 +340,6 @@ def sample_subsets( A list of ``(discrete_mask, continuous_inactive_params)`` tuples. """ d_iter = self.discrete.subset_masks( - candidates_exp, min_candidates=min_discrete_candidates, mode="replace", ) @@ -556,26 +532,21 @@ def _drop_parameters(self, names: Collection[str], /) -> _ReducedSearchSpace: ) remaining = [p for p in self.parameters if p.name not in names_set] - disc_params = [p for p in remaining if p.is_discrete] - cont_params = [p for p in remaining if p.is_continuous] + disc_params = [p for p in remaining if isinstance(p, DiscreteParameter)] + cont_params = [p for p in remaining if isinstance(p, ContinuousParameter)] # Explicit comp_rep needed because transform() drops columns for empty inputs. discrete = ( SubspaceDiscrete( parameters=disc_params, exp_rep=pd.DataFrame(columns=[p.name for p in disc_params]), - comp_rep=pd.DataFrame( - columns=[c for p in disc_params for c in p.comp_rep_columns] - ), ) if disc_params else SubspaceDiscrete.empty() ) continuous = ( - SubspaceContinuous( - parameters=cont_params, - ) + SubspaceContinuous(parameters=cont_params) if cont_params else SubspaceContinuous.empty() ) diff --git a/baybe/searchspace/discrete.py b/baybe/searchspace/discrete.py index b8c3e1c0ae..5521e6caf6 100644 --- a/baybe/searchspace/discrete.py +++ b/baybe/searchspace/discrete.py @@ -5,17 +5,19 @@ import gc import random import warnings -from collections.abc import Collection, Iterator, Sequence +from collections.abc import Callable, Collection, Iterator, Sequence from itertools import islice from math import prod -from typing import TYPE_CHECKING, Any, Literal +from typing import TYPE_CHECKING, Annotated, Any, Literal +import cattrs import numpy as np import numpy.typing as npt import pandas as pd -from attrs import define, field +from attrs import define, field, fields +from attrs.validators import deep_iterable, instance_of from cattrs import IterableValidationError -from typing_extensions import override +from typing_extensions import Self, override from baybe.constraints import DISCRETE_CONSTRAINTS_FILTERING_ORDER, validate_constraints from baybe.constraints.base import DiscreteConstraint @@ -29,7 +31,7 @@ from baybe.parameters.base import DiscreteParameter from baybe.parameters.utils import get_parameters_from_dataframe, sort_parameters from baybe.searchspace.utils import build_constrained_product, select_via_flat_index -from baybe.searchspace.validation import validate_parameter_names, validate_parameters +from baybe.searchspace.validation import validate_parameters from baybe.serialization import SerialMixin, converter, select_constructor_hook from baybe.settings import active_settings from baybe.utils.basic import to_tuple @@ -46,6 +48,24 @@ from baybe.searchspace.core import SearchSpace +def _deprecate_argument(error: bool, msg: str | Callable[[], str] | None = None): + """Helper for deprecating legacy arguments.""" # noqa: D401 + + def validator(self, attribute, value): + if value is not None: + # Generate message lazily if callable, otherwise use provided string + warning_msg = (msg() if callable(msg) else msg) or ( + f"Providing '{attribute.alias}' to '{self.__class__.__name__}' is no " + f"longer supported. To proceed, simply drop the argument." + ) + if error: + raise DeprecationError(warning_msg) + else: + warnings.warn(warning_msg, DeprecationWarning, stacklevel=3) + + return validator + + @define(kw_only=True) class MemorySize: """Estimated memory size of a :class:`SubspaceDiscrete`.""" @@ -83,39 +103,80 @@ def comp_rep_human_readable(self) -> tuple[float, str]: class SubspaceDiscrete(SerialMixin): """Class for managing discrete subspaces. - Builds the subspace from parameter definitions and optional constraints, keeps - track of search metadata, and provides access to candidate sets and different - parameter views. + Builds the subspace from parameter definitions and optional constraints, + and provides access to candidate sets and different parameter views. """ parameters: tuple[DiscreteParameter, ...] = field( converter=sort_parameters, - validator=lambda _, __, x: validate_parameter_names(x), + validator=[ + deep_iterable(member_validator=instance_of(DiscreteParameter)), + lambda _, __, x: validate_parameters(x, allow_empty=True), + ], ) - """The list of parameters of the subspace.""" + """The parameters spanning the subspace.""" - exp_rep: pd.DataFrame = field(eq=eq_dataframe) + _exp_rep: pd.DataFrame = field( + alias="exp_rep", validator=instance_of(pd.DataFrame), eq=eq_dataframe + ) """The experimental representation of the subspace.""" - empty_encoding: bool = field(default=False) - """Flag encoding whether an empty encoding is used.""" - - constraints: tuple[DiscreteConstraint, ...] = field( - converter=lambda x: to_tuple( - sorted( - x, - key=lambda c: DISCRETE_CONSTRAINTS_FILTERING_ORDER.index(c.__class__), - ) + _empty_encoding: Annotated[bool, cattrs.override(omit=True)] = field( + alias="empty_encoding", default=None, validator=_deprecate_argument(error=False) + ) + "Ignore! For backwards compatibility only." + + _constraints: Annotated[ + tuple[DiscreteConstraint, ...], cattrs.override(omit=True) + ] = field( + alias="constraints", + default=None, + validator=_deprecate_argument( + error=False, + msg=lambda: _make_constraints_deprecation_msg(), # noqa: PLW0108 ), - factory=tuple, ) - """A list of constraints for restricting the space.""" + "Ignore! For backwards compatibility only." + + _comp_rep: Annotated[pd.DataFrame, cattrs.override(omit=True)] = field( + alias="comp_rep", default=None, validator=_deprecate_argument(error=True) + ) + "Ignore! For backwards compatibility only." + + batch_constraints: tuple[DiscreteBatchConstraint, ...] = field( + default=(), + converter=to_tuple, + validator=deep_iterable(member_validator=instance_of(DiscreteBatchConstraint)), + ) + """Constraints operating on the recommendation batch level.""" + + def __attrs_post_init__(self) -> None: + """Migrate deprecated ``constraints`` argument to ``batch_constraints``.""" + # >>>>>>>>>> Deprecation + if self._constraints is not None: + batch: tuple[DiscreteBatchConstraint, ...] = tuple( + c for c in self._constraints if isinstance(c, DiscreteBatchConstraint) + ) + + if n_non_batch := len(self._constraints) - len(batch): + warnings.warn( + f"You provided {n_non_batch} filtering constraint(s) via " + f"'constraints' but filtering constraints are (and always have " + f"been) ignored when entered via '__init__'. The latter assumes " + f"that all filtering constraints have already been applied to the " + f"given experimental candidate representation. To avoid this " + f"warning, either drop the filtering constraints or use one of the " + f"alternative constructors.", + DeprecationWarning, + stacklevel=2, + ) + + if batch: + self.batch_constraints = self.batch_constraints + batch - comp_rep: pd.DataFrame = field(eq=eq_dataframe) - """The computational representation of the space. Technically not required but added - as an optional initializer argument to allow ingestion from e.g. serialized objects - and thereby speed up construction. If not provided, the default hook will derive it - from ``exp_rep``.""" + # attrs validators have already run at this point, so re-validate. + validate_constraints(self.batch_constraints, self.parameters) + # <<<<<<<<<< Deprecation @override def __str__(self) -> str: @@ -124,40 +185,48 @@ def __str__(self) -> str: # Convert the lists to dataFrames to be able to use pretty_printing param_list = [param.summary() for param in self.parameters] - constraints_list = [constr.summary() for constr in self.constraints] + batch_constraints_list = [constr.summary() for constr in self.batch_constraints] param_df = pd.DataFrame(param_list) - constraints_df = pd.DataFrame(constraints_list) + batch_constraints_df = pd.DataFrame(batch_constraints_list) fields = [ to_string( "Discrete Parameters", pretty_print_df(param_df, max_colwidth=None), ), - to_string("Experimental Representation", pretty_print_df(self.exp_rep)), - to_string("Constraints", pretty_print_df(constraints_df)), - to_string("Computational Representation", pretty_print_df(self.comp_rep)), + to_string("Batch Constraints", pretty_print_df(batch_constraints_df)), ] return to_string(self.__class__.__name__, *fields) - @exp_rep.validator + @_exp_rep.validator def _validate_exp_rep( # noqa: DOC101, DOC103 self, _: Any, exp_rep: pd.DataFrame ) -> None: """Validate the experimental representation. Raises: + ValueError: If the provided dataframe columns do not match the parameter + names of the subspace. ValueError: If the index of the provided dataframe contains duplicates. """ + if set(exp_rep.columns) != {p.name for p in self.parameters}: + raise ValueError( + "The columns of the experimental representation must match the " + "parameter names of the subspace." + ) + # TODO: We should ideally also also validate that there are no duplicate rows, + # but in the current eager implementation this is a costly operation. + # To be revisited once the lazy implementation is in place. if exp_rep.index.has_duplicates: raise ValueError( "The index of this search space contains duplicates. " "This is not allowed, as it can lead to hard-to-detect bugs." ) - @comp_rep.default - def _default_comp_rep(self) -> pd.DataFrame: - """Create the default computational representation.""" - return self.transform(self.exp_rep) + @batch_constraints.validator + def _validate_batch_constraints(self, _, __) -> None: # noqa: DOC101, DOC103 + """Validate batch constraints.""" + validate_constraints(self.batch_constraints, self.parameters) def to_searchspace(self) -> SearchSpace: """Turn the subspace into a search space with no continuous part.""" @@ -166,12 +235,12 @@ def to_searchspace(self) -> SearchSpace: return SearchSpace(discrete=self) @classmethod - def empty(cls) -> SubspaceDiscrete: + def empty(cls) -> Self: """Create an empty discrete subspace.""" - return SubspaceDiscrete(parameters=[], exp_rep=pd.DataFrame()) + return cls(parameters=[], exp_rep=pd.DataFrame()) @classmethod - def from_parameter(cls, parameter: DiscreteParameter) -> SubspaceDiscrete: + def from_parameter(cls, parameter: DiscreteParameter) -> Self: """Create a subspace from a single parameter. Args: @@ -187,21 +256,37 @@ def from_product( cls, parameters: Sequence[DiscreteParameter], constraints: Sequence[DiscreteConstraint] | None = None, - empty_encoding: bool = False, - ) -> SubspaceDiscrete: + empty_encoding: bool | None = None, + ) -> Self: """See :class:`baybe.searchspace.core.SearchSpace`.""" - constraints = constraints or [] + validate_parameters(parameters, allow_empty=True) - if constraints: + if constraints is None: + constraints = [] + else: + constraints = sorted( + constraints, + key=lambda x: DISCRETE_CONSTRAINTS_FILTERING_ORDER.index(x.__class__), + ) validate_constraints(constraints, parameters) - df = build_constrained_product(parameters, constraints) + filtering_constraints = [c for c in constraints if c.eval_during_creation] + batch_constraints = [c for c in constraints if c.eval_during_modeling] + assert len(filtering_constraints) + len(batch_constraints) == len( + constraints + ), ( + "The constraints could not be fully partitioned into filtering and batch " + "constraints. The current logic assumes that each constraint belongs " + "exactly to one type." + ) + + df = build_constrained_product(parameters, filtering_constraints) - return SubspaceDiscrete( + return cls( parameters=parameters, - constraints=constraints, + batch_constraints=batch_constraints, exp_rep=df, - empty_encoding=empty_encoding, + empty_encoding=empty_encoding, # type: ignore[arg-type] ) @classmethod @@ -209,8 +294,9 @@ def from_dataframe( cls, df: pd.DataFrame, parameters: Sequence[DiscreteParameter] | None = None, - empty_encoding: bool = False, - ) -> SubspaceDiscrete: + batch_constraints: Collection[DiscreteBatchConstraint] = (), + empty_encoding: bool | None = None, + ) -> Self: """Create a discrete subspace with a specified set of configurations. Args: @@ -226,7 +312,9 @@ def from_dataframe( fallback. For both types, default values are used for their optional arguments. For more details, see :func:`baybe.parameters.utils.get_parameters_from_dataframe`. - empty_encoding: See :func:`baybe.searchspace.core.SearchSpace.from_product`. + batch_constraints: Optional batch constraints to be applied at + recommendation time. + empty_encoding: Ignore! For backwards compatibility only. Returns: The created discrete subspace. @@ -262,7 +350,12 @@ def discrete_parameter_factory( # Ensure dtype consistency df = normalize_input_dtypes(df, parameters) - return cls(parameters=parameters, exp_rep=df, empty_encoding=empty_encoding) + return cls( + parameters=parameters, + exp_rep=df, + batch_constraints=batch_constraints, + empty_encoding=empty_encoding, # type: ignore[arg-type] + ) @classmethod def from_simplex( @@ -275,7 +368,7 @@ def from_simplex( max_nonzero: int | None = None, boundary_only: bool = False, tolerance: float = 1e-6, - ) -> SubspaceDiscrete: + ) -> Self: """Efficiently create discrete simplex subspaces. The same result can be achieved using @@ -326,9 +419,6 @@ def from_simplex( if max_nonzero is None: max_nonzero = len(simplex_parameters) - # Validate constraints - validate_constraints(constraints, [*simplex_parameters, *product_parameters]) - # Validate parameter types if not ( all(isinstance(p, NumericalDiscreteParameter) for p in simplex_parameters) @@ -353,6 +443,12 @@ def from_simplex( f"parameters: {overlap}." ) + # Validate constraints + if constraints: + validate_constraints( + constraints, [*simplex_parameters, *product_parameters] + ) + # Handle degenerate simplex cases if len(simplex_parameters) < 2: warnings.warn( @@ -472,15 +568,25 @@ def drop_invalid( if boundary_only: drop_invalid(exp_rep, max_sum, boundary_only=True) - # Merge product parameters and apply constraints incrementally + filtering_constraints = [c for c in constraints if c.eval_during_creation] + batch_constraints_list = [c for c in constraints if c.eval_during_modeling] + assert len(filtering_constraints) + len(batch_constraints_list) == len( + constraints + ), ( + "The constraints could not be fully partitioned into filtering and batch " + "constraints. The current logic assumes that each constraint belongs " + "exactly to one type." + ) + + # Merge product parameters and apply filtering constraints incrementally exp_rep = build_constrained_product( - product_parameters, constraints, initial_df=exp_rep + product_parameters, filtering_constraints, initial_df=exp_rep ) return cls( parameters=[*simplex_parameters, *product_parameters], exp_rep=exp_rep, - constraints=constraints, + batch_constraints=batch_constraints_list, ) @property @@ -500,28 +606,52 @@ def is_empty(self) -> bool: """Return whether this subspace is empty.""" return len(self.parameters) == 0 - @property - def is_constrained(self) -> bool: - """Boolean indicating if the subspace has any constraints.""" - return len(self.constraints) > 0 - @property def parameter_names(self) -> tuple[str, ...]: """Return tuple of parameter names.""" return tuple(p.name for p in self.parameters) + # >>>>>>>>>> Deprecation + @property + def exp_rep(self) -> pd.DataFrame: + """Deprecated! Use :meth:`get_candidates` instead.""" + get_candidates = type(self).get_candidates.__name__ + warnings.warn( + f"Accessing 'exp_rep' is deprecated and will be removed in a future " + f"version. Use '{get_candidates}()' instead.", + DeprecationWarning, + stacklevel=2, + ) + return self._exp_rep + + @property + def comp_rep(self) -> pd.DataFrame: + """Deprecated! Use :meth:`transform` with :meth:`get_candidates` instead.""" + cls = type(self) + transform = cls.transform.__name__ + get_candidates = cls.get_candidates.__name__ + warnings.warn( + f"Accessing 'comp_rep' is deprecated and will be removed in a future " + f"version. Use '{transform}({get_candidates}())' instead.", + DeprecationWarning, + stacklevel=2, + ) + return self.transform(self._exp_rep) + + # <<<<<<<<<< Deprecation + @property def comp_rep_columns(self) -> tuple[str, ...]: """The columns spanning the computational representation.""" - # We go via `comp_rep` here instead of using the columns of the individual - # parameters because the search space potentially uses only a subset of the - # columns due to decorrelation - return tuple(self.comp_rep.columns) + return tuple(col for p in self.parameters for col in p.comp_rep_columns) @property def comp_rep_bounds(self) -> pd.DataFrame: """The minimum and maximum values of the computational representation.""" - return pd.DataFrame({"min": self.comp_rep.min(), "max": self.comp_rep.max()}).T + if not self.parameters: + return pd.DataFrame(index=["min", "max"]) + df = pd.concat([p.comp_df for p in self.parameters], axis=1) + return pd.DataFrame({"min": df.min(), "max": df.max()}).T @property def scaling_bounds(self) -> pd.DataFrame: @@ -577,14 +707,20 @@ def estimate_product_space_size( comp_rep_shape=(n_rows, n_cols_comp), ) + # >>>>>>>>>> Deprecation @property - def constraints_batch( - self, - ) -> tuple[DiscreteBatchConstraint, ...]: - """The batch constraints of the subspace.""" - return tuple( - c for c in self.constraints if isinstance(c, DiscreteBatchConstraint) + def constraints_batch(self) -> tuple[DiscreteBatchConstraint, ...]: + """Deprecated! Use :attr:`batch_constraints` instead.""" + replacement = fields(type(self)).batch_constraints.name + warnings.warn( + f"Accessing 'constraints_batch' is deprecated and will be disabled in a " + f"future version. Use '{replacement}' instead.", + DeprecationWarning, + stacklevel=2, ) + return self.batch_constraints + + # <<<<<<<<<< Deprecation @property def n_subsets(self) -> int: @@ -593,16 +729,15 @@ def n_subsets(self) -> int: Returns 0 if no subset-generating constraints exist, indicating that no decomposition is needed. """ - if not self.constraints_batch: + if not self.batch_constraints: return 0 return prod( len(self.get_parameters_by_name([c.parameters[0]])[0].active_values) - for c in self.constraints_batch + for c in self.batch_constraints ) def subset_masks( self, - candidates_exp: pd.DataFrame, min_candidates: int | None = None, mode: Literal["sequential", "shuffled", "replace"] = "shuffled", ) -> Iterator[npt.NDArray[np.bool_]]: @@ -613,7 +748,6 @@ def subset_masks( combined masks. Args: - candidates_exp: The experimental representation of candidate points. min_candidates: If provided, combined masks selecting fewer rows are silently skipped. mode: The iteration strategy. @@ -633,10 +767,12 @@ def subset_masks( raise ValueError(f"Invalid {mode=}. Must be one of {allowed}.") per_constraint: list[list[npt.NDArray[np.bool_]]] - if not (constraints := self.constraints_batch): - per_constraint = [[np.ones(len(candidates_exp), dtype=bool)]] + if not self.batch_constraints: + per_constraint = [[np.ones(len(self.get_candidates()), dtype=bool)]] else: - per_constraint = [c.subset_masks(candidates_exp) for c in constraints] + per_constraint = [ + c.subset_masks(self.get_candidates()) for c in self.batch_constraints + ] total = prod(len(masks) for masks in per_constraint) @@ -667,14 +803,12 @@ def subset_masks( def sample_subset_masks( self, - candidates_exp: pd.DataFrame, n: int, min_candidates: int | None = None, ) -> list[npt.NDArray[np.bool_]]: """Sample subset masks (without replacement). Args: - candidates_exp: The experimental representation of candidate points. n: Number of masks to sample. min_candidates: If provided, Subsets with fewer matching candidates are skipped. @@ -684,19 +818,14 @@ def sample_subset_masks( """ return list( islice( - self.subset_masks(candidates_exp, min_candidates), + self.subset_masks(min_candidates), n, ) ) - def get_candidates(self) -> tuple[pd.DataFrame, pd.DataFrame]: - """Return the set of candidate parameter settings that can be tested. - - Returns: - The candidate parameter settings both in experimental and computational - representation. - """ - return self.exp_rep, self.comp_rep + def get_candidates(self) -> pd.DataFrame: + """Return all candidate parameter configurations.""" + return self._exp_rep def transform( self, @@ -712,24 +841,12 @@ def transform( df, self.parameters, allow_missing=allow_missing, allow_extra=allow_extra ) - # If the transformed values are not required, return an empty dataframe - if self.empty_encoding or len(df) < 1: - return pd.DataFrame(index=df.index) - # Transform the parameters dfs = [] for param in parameters: comp_df = param.transform(df[param.name]) dfs.append(comp_df) - comp_rep = pd.concat(dfs, axis=1) if dfs else pd.DataFrame() - - # If the computational representation has already been built (with potentially - # removing some columns, e.g. due to decorrelation or dropping constant ones), - # any subsequent transformation should yield the same columns. - try: - return comp_rep[self.comp_rep.columns] - except AttributeError: - return comp_rep + return pd.concat(dfs, axis=1) if dfs else pd.DataFrame() def get_parameters_by_name( self, names: Sequence[str] @@ -750,13 +867,15 @@ def validate_simplex_subspace_from_config(specs: dict, _) -> None: # Validate product inputs without constructing it if specs.get("constructor", None) == "from_product": parameters = converter.structure(specs["parameters"], list[DiscreteParameter]) - validate_parameters(parameters) + validate_parameters(parameters, allow_empty=True) - constraints = specs.get("constraints", []) - if constraints: - constraints = converter.structure( - specs["constraints"], list[DiscreteConstraint] - ) + # Support both the current `constraints` key (deprecated) and + # the new `batch_constraints` key for forward/backward compatibility + constraints_raw = specs.get("constraints", []) or specs.get( + "batch_constraints", [] + ) + if constraints_raw: + constraints = converter.structure(constraints_raw, list[DiscreteConstraint]) validate_constraints(constraints, parameters) # Validate simplex inputs without constructing it @@ -780,11 +899,13 @@ def validate_simplex_subspace_from_config(specs: dict, _) -> None: validate_parameters(simplex_parameters + product_parameters) - constraints = specs.get("constraints", []) - if constraints: - constraints = converter.structure( - specs["constraints"], list[DiscreteConstraint] - ) + # Support both the current `constraints` key (deprecated) and + # the new `batch_constraints` key for forward/backward compatibility + constraints_raw = specs.get("constraints", []) or specs.get( + "batch_constraints", [] + ) + if constraints_raw: + constraints = converter.structure(constraints_raw, list[DiscreteConstraint]) validate_constraints(constraints, simplex_parameters + product_parameters) # For all other types, validate by construction @@ -792,8 +913,57 @@ def validate_simplex_subspace_from_config(specs: dict, _) -> None: converter.structure(specs, SubspaceDiscrete) -# Register deserialization hook -converter.register_structure_hook(SubspaceDiscrete, select_constructor_hook) +# >>>>>>>>>> Deprecation +def _make_constraints_deprecation_msg() -> str: + """Generate the constraints deprecation message with programmatic names.""" + # Get field aliases programmatically + constraints_alias = fields(SubspaceDiscrete)._constraints.alias + batch_constraints_alias = fields(SubspaceDiscrete).batch_constraints.alias + + return ( + f"Providing '{constraints_alias}' to '{SubspaceDiscrete.__name__}' is no " + f"longer supported. Please update your code as follows:\n" + f" • Use '{batch_constraints_alias}' for '{DiscreteBatchConstraint.__name__}' " + f"objects. Any batch constraints you have provided have been extracted " + f"automatically for you. This automatic extraction is temporary and will be " + f"removed in a future version.\n" + f" • Filtering constraints can simply be dropped. Instead, make sure you " + f"construct the experimental representation to satisfy them." + ) + + +def _structure_subspace_discrete(specs: dict, cls: type) -> SubspaceDiscrete: + """Structure hook supporting legacy ``constraints`` key migration.""" + specs = specs.copy() + if "constraints" in specs and specs["constraints"] is not None: + warnings.warn( + _make_constraints_deprecation_msg(), + DeprecationWarning, + stacklevel=2, + ) + legacy_constraints = converter.structure( + specs.pop("constraints"), list[DiscreteConstraint] + ) + batch_from_legacy = [ + c for c in legacy_constraints if isinstance(c, DiscreteBatchConstraint) + ] + if batch_from_legacy: + existing = specs.get("batch_constraints", []) + existing_structured = converter.structure( + existing, list[DiscreteBatchConstraint] + ) + specs["batch_constraints"] = [ + c.to_dict() for c in existing_structured + batch_from_legacy + ] + else: + specs.pop("constraints", None) + return select_constructor_hook(specs, cls) + + +# Uncomment when removing the deprecation: +# converter.register_structure_hook(SubspaceDiscrete, select_constructor_hook) +converter.register_structure_hook(SubspaceDiscrete, _structure_subspace_discrete) +# <<<<<<<<<< Deprecation # Collect leftover original slotted classes processed by `attrs.define` gc.collect() diff --git a/baybe/searchspace/validation.py b/baybe/searchspace/validation.py index 6045301878..0419d8edf9 100644 --- a/baybe/searchspace/validation.py +++ b/baybe/searchspace/validation.py @@ -7,8 +7,8 @@ import pandas as pd from baybe.exceptions import EmptySearchSpaceError, IncompatibilityError -from baybe.parameters import TaskParameter from baybe.parameters.base import Parameter, _DiscreteLabelLikeParameter +from baybe.parameters.categorical import TaskParameter from baybe.utils.dataframe import get_transform_objects try: # For python < 3.11, use the exceptiongroup backport @@ -19,38 +19,33 @@ _T = TypeVar("_T", bound=Parameter) -def validate_parameter_names( # noqa: DOC101, DOC103 - parameters: Collection[Parameter], -) -> None: - """Validate the parameter names. - - Raises: - ValueError: If the given list contains parameters with the same name. - """ - param_names = [p.name for p in parameters] - if len(set(param_names)) != len(param_names): - raise ValueError("All parameters must have unique names.") - - -def validate_parameters(parameters: Collection[Parameter]) -> None: # noqa: DOC101, DOC103 +def validate_parameters( + parameters: Collection[Parameter], *, allow_empty: bool = False +) -> None: # noqa: DOC101, DOC103 """Validate the parameters. + Args: + parameters: The parameters to validate. + allow_empty: Whether to allow an empty parameter collection. + Raises: - EmptySearchSpaceError: If the parameter list is empty. + ValueError: If the given list contains parameters with the same name. + EmptySearchSpaceError: If no parameter is provided and empty + collections are explicitly disallowed. NotImplementedError: If more than one :class:`baybe.parameters.categorical.TaskParameter` is requested. """ - if not parameters: + if not allow_empty and not parameters: raise EmptySearchSpaceError("At least one parameter must be provided.") - # TODO [16932]: Remove once more task parameters are supported if len([p for p in parameters if isinstance(p, TaskParameter)]) > 1: raise NotImplementedError( "Currently, at most one task parameter can be considered." ) - # Assert: unique names - validate_parameter_names(parameters) + param_names = [p.name for p in parameters] + if len(set(param_names)) != len(param_names): + raise ValueError("All parameters must have unique names.") def validate_dataframe_active_values( diff --git a/baybe/serialization/core.py b/baybe/serialization/core.py index df873c8759..e7714ecb85 100644 --- a/baybe/serialization/core.py +++ b/baybe/serialization/core.py @@ -18,6 +18,7 @@ import attrs import cattrs import pandas as pd +from cattrs.gen import make_dict_structure_fn from cattrs.strategies import configure_union_passthrough from baybe.utils.basic import find_subclass @@ -229,8 +230,8 @@ def select_constructor_hook(specs: dict, cls: type[_T]) -> _T: # Call the constructor with the deserialized arguments return constructor(**specs) - # Otherwise, use the regular __init__ method - return converter.structure_attrs_fromdict(specs, cls) + # Otherwise, structure using the regular mechanism + return make_dict_structure_fn(cls, converter)(specs, cls) # Register custom (un-)structure hooks diff --git a/baybe/simulation/scenarios.py b/baybe/simulation/scenarios.py index dc5763e2eb..ad34406641 100644 --- a/baybe/simulation/scenarios.py +++ b/baybe/simulation/scenarios.py @@ -283,9 +283,13 @@ def _simulate_groupby( # space constructor, the integer-based indexing provides a second safety net. # Hence, the "reset_index" call. if groupby is None: - groups = ((None, campaign.searchspace.discrete.exp_rep.reset_index()),) + groups = ((None, campaign.searchspace.discrete.get_candidates().reset_index()),) else: - groups = campaign.searchspace.discrete.exp_rep.reset_index().groupby(groupby) + groups = ( + campaign.searchspace.discrete.get_candidates() + .reset_index() + .groupby(groupby) + ) # Simulate all subgroups dfs = [] diff --git a/baybe/utils/dataframe.py b/baybe/utils/dataframe.py index de31de5c36..e797be8acf 100644 --- a/baybe/utils/dataframe.py +++ b/baybe/utils/dataframe.py @@ -7,8 +7,10 @@ from collections.abc import Callable, Collection, Iterable, Sequence from typing import TYPE_CHECKING, Any, Literal, TypeVar, overload +import narwhals.stable.v2 as nw import numpy as np import pandas as pd +from narwhals.stable.v2.typing import IntoDataFrame from typing_extensions import assert_never from baybe.exceptions import InputDataTypeWarning, SearchSpaceMatchWarning @@ -791,3 +793,8 @@ def needs_float_dtype(obj) -> bool: for col in cols_to_convert: df[col] = df[col].astype(active_settings.DTypeFloatNumpy) return df + + +def to_lazy(df: IntoDataFrame, /) -> nw.LazyFrame: + """Convert any dataframe to a :class:`~narwhals.LazyFrame`.""" + return nw.from_native(df).lazy() diff --git a/baybe/utils/validation.py b/baybe/utils/validation.py index 93c87ab316..52a86f0f1b 100644 --- a/baybe/utils/validation.py +++ b/baybe/utils/validation.py @@ -3,7 +3,7 @@ from __future__ import annotations import math -from collections.abc import Callable, Iterable +from collections.abc import Callable, Iterable, Sequence from typing import TYPE_CHECKING, Any import numpy as np @@ -101,7 +101,7 @@ def validate_target_input(data: pd.DataFrame, targets: Iterable[Target]) -> None if data.empty: raise ValueError("The provided input dataframe cannot be empty.") - if missing := {t.name for t in targets}.difference(data.columns): + if missing := {t.name for t in targets} - set(data.columns): raise ValueError( f"The input dataframe is missing columns for the following targets: " f"{missing}" @@ -147,8 +147,10 @@ def validate_objective_input(data: pd.DataFrame, objective: Objective) -> None: def validate_parameter_input( data: pd.DataFrame, - parameters: Iterable[Parameter], + parameters: Sequence[Parameter], numerical_measurements_must_be_within_tolerance: bool = False, + *, + allow_extra: bool = True, ) -> None: """Validate input dataframe columns corresponding to parameters. @@ -158,22 +160,32 @@ def validate_parameter_input( numerical_measurements_must_be_within_tolerance: If ``True``, numerical parameter values must match to parameter values within the parameter-specific tolerance. + allow_extra: If ``False``, the dataframe is not allowed to contain columns that + do not correspond to any parameter. Raises: ValueError: If the data is empty. ValueError: If the data misses columns for a parameter. + ValueError: If the data contains columns that do not correspond to any parameter + and the corresponding check is enabled. ValueError: If a parameter contains NaN. TypeError: If a parameter contains non-numeric values. """ if data.empty: raise ValueError("The provided input dataframe cannot be empty.") - if missing := {p.name for p in parameters}.difference(data.columns): + if missing := {p.name for p in parameters} - set(data.columns): raise ValueError( f"The input dataframe is missing columns for the following parameters: " f"{missing}" ) + if not allow_extra and (extra := set(data.columns) - {p.name for p in parameters}): + raise ValueError( + f"The input dataframe contains columns that do not correspond to any " + f"parameter: {extra}" + ) + for p in parameters: if data[p.name].isna().any(): raise ValueError( diff --git a/docs/conf.py b/docs/conf.py index 6727d8659c..47469038bc 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -194,6 +194,8 @@ ("py:obj", "baybe.serialization.mixin.SupportsRead.read"), ("py:obj", "baybe.serialization.mixin.SupportsWrite.write"), ("py:class", "baybe.surrogates.gaussian_process.components.PlainKernelFactory"), + # Deprecation field alias shadowing cached_property of the same name + (r"py:obj", r"baybe\.searchspace\.discrete\.SubspaceDiscrete\.comp_rep"), # Private classes (r"py:class", r"baybe\..*\._.*"), # Deprecation @@ -320,15 +322,16 @@ intersphinx_mapping = { "botorch": ("https://botorch.readthedocs.io/en/latest", None), "gpytorch": ("https://docs.gpytorch.ai/en/stable/", None), - "python": ("https://docs.python.org/3", None), + "narwhals": ("https://narwhals-dev.github.io/narwhals/", None), + "numpy": ("https://numpy.org/doc/stable/", None), "pandas": ("https://pandas.pydata.org/docs/", None), "polars": ("https://docs.pola.rs/api/python/stable/", None), + "python": ("https://docs.python.org/3", None), + "rdkit": ("https://rdkit.org/docs/", None), + "shap": ("https://shap.readthedocs.io/en/stable/", None), "skfp": ("https://scikit-fingerprints.readthedocs.io/latest/", None), "sklearn": ("https://scikit-learn.org/stable/", None), - "numpy": ("https://numpy.org/doc/stable/", None), "torch": ("https://pytorch.org/docs/main/", None), - "rdkit": ("https://rdkit.org/docs/", None), - "shap": ("https://shap.readthedocs.io/en/stable/", None), "xyzpy": ("https://xyzpy.readthedocs.io/en/latest/", None), } diff --git a/examples/Constraints_Discrete/custom_constraints.py b/examples/Constraints_Discrete/custom_constraints.py index bde644b596..0ede01683b 100644 --- a/examples/Constraints_Discrete/custom_constraints.py +++ b/examples/Constraints_Discrete/custom_constraints.py @@ -115,43 +115,33 @@ def custom_function(df: pd.DataFrame) -> pd.Series: N_ITERATIONS = 3 for kIter in range(N_ITERATIONS): + candidates = campaign.searchspace.discrete.get_candidates() + print(f"\n\n#### ITERATION {kIter + 1} ####") print("## ASSERTS ##") print( "Number of entries with water, temp > 120 and concentration > 5: ", ( - campaign.searchspace.discrete.exp_rep["Concentration"].apply( - lambda x: x > 5 - ) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply( - lambda x: x > 120 - ) - & campaign.searchspace.discrete.exp_rep["Solvent"].eq("water") + candidates["Concentration"].apply(lambda x: x > 5) + & candidates["Temperature"].apply(lambda x: x > 120) + & candidates["Solvent"].eq("water") ).sum(), ) print( "Number of entries with C2, temp > 180 and concentration > 3: ", ( - campaign.searchspace.discrete.exp_rep["Concentration"].apply( - lambda x: x > 3 - ) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply( - lambda x: x > 180 - ) - & campaign.searchspace.discrete.exp_rep["Solvent"].eq("C2") + candidates["Concentration"].apply(lambda x: x > 3) + & candidates["Temperature"].apply(lambda x: x > 180) + & candidates["Solvent"].eq("C2") ).sum(), ) print( "Number of entries with C3, temp > 150 and concentration > 3: ", ( - campaign.searchspace.discrete.exp_rep["Concentration"].apply( - lambda x: x > 3 - ) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply( - lambda x: x > 150 - ) - & campaign.searchspace.discrete.exp_rep["Solvent"].eq("C3") + candidates["Concentration"].apply(lambda x: x > 3) + & candidates["Temperature"].apply(lambda x: x > 150) + & candidates["Solvent"].eq("C3") ).sum(), ) diff --git a/examples/Constraints_Discrete/dependency_constraints.py b/examples/Constraints_Discrete/dependency_constraints.py index 011224e68f..ead9421194 100644 --- a/examples/Constraints_Discrete/dependency_constraints.py +++ b/examples/Constraints_Discrete/dependency_constraints.py @@ -77,39 +77,29 @@ N_ITERATIONS = 2 if SMOKE_TEST else 5 for kIter in range(N_ITERATIONS): + candidates = campaign.searchspace.discrete.get_candidates() + print(f"\n#### ITERATION {kIter + 1} ####") print("## ASSERTS ##") print( f"Number entries with both switches on " f"(expected {RESOLUTION * len(dict_solvent) * 2 * 2}): ", - ( - (campaign.searchspace.discrete.exp_rep["Switch1"] == "on") - & (campaign.searchspace.discrete.exp_rep["Switch2"] == "right") - ).sum(), + ((candidates["Switch1"] == "on") & (candidates["Switch2"] == "right")).sum(), ) print( f"Number entries with Switch1 off (expected {2 * 2}): ", - ( - (campaign.searchspace.discrete.exp_rep["Switch1"] == "off") - & (campaign.searchspace.discrete.exp_rep["Switch2"] == "right") - ).sum(), + ((candidates["Switch1"] == "off") & (candidates["Switch2"] == "right")).sum(), ) print( f"Number entries with Switch2 off " f"(expected {RESOLUTION * len(dict_solvent)}):" f" ", - ( - (campaign.searchspace.discrete.exp_rep["Switch1"] == "on") - & (campaign.searchspace.discrete.exp_rep["Switch2"] == "left") - ).sum(), + ((candidates["Switch1"] == "on") & (candidates["Switch2"] == "left")).sum(), ) print( "Number entries with both switches off (expected 1): ", - ( - (campaign.searchspace.discrete.exp_rep["Switch1"] == "off") - & (campaign.searchspace.discrete.exp_rep["Switch2"] == "left") - ).sum(), + ((candidates["Switch1"] == "off") & (candidates["Switch2"] == "left")).sum(), ) rec = campaign.recommend(batch_size=5) diff --git a/examples/Constraints_Discrete/exclusion_constraints.py b/examples/Constraints_Discrete/exclusion_constraints.py index feba15858d..c801b4ddfb 100644 --- a/examples/Constraints_Discrete/exclusion_constraints.py +++ b/examples/Constraints_Discrete/exclusion_constraints.py @@ -114,32 +114,30 @@ N_ITERATIONS = 3 for kIter in range(N_ITERATIONS): + candidates = campaign.searchspace.discrete.get_candidates() + print(f"\n\n#### ITERATION {kIter + 1} ####") print("## ASSERTS ##") print( "Number of entries with either Solvents C2 or C4 and a temperature above 151: ", ( - campaign.searchspace.discrete.exp_rep["Temp"].apply(lambda x: x > 151) - & campaign.searchspace.discrete.exp_rep["Solv"].apply( - lambda x: x in ["C2", "C4"] - ) + candidates["Temp"].apply(lambda x: x > 151) + & candidates["Solv"].apply(lambda x: x in ["C2", "C4"]) ).sum(), ) print( "Number of entries with either Solvents C5 or C6 and a pressure above 5: ", ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x > 5) - & campaign.searchspace.discrete.exp_rep["Solv"].apply( - lambda x: x in ["C5", "C6"] - ) + candidates["Pressure"].apply(lambda x: x > 5) + & candidates["Solv"].apply(lambda x: x in ["C5", "C6"]) ).sum(), ) print( "Number of entries with pressure below 3 and temperature above 120: ", ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x < 3) - & campaign.searchspace.discrete.exp_rep["Temp"].apply(lambda x: x > 120) + candidates["Pressure"].apply(lambda x: x < 3) + & candidates["Temp"].apply(lambda x: x > 120) ).sum(), ) diff --git a/examples/Constraints_Discrete/prodsum_constraints.py b/examples/Constraints_Discrete/prodsum_constraints.py index ff482cdc49..f62a70eba0 100644 --- a/examples/Constraints_Discrete/prodsum_constraints.py +++ b/examples/Constraints_Discrete/prodsum_constraints.py @@ -109,30 +109,22 @@ N_ITERATIONS = 2 if SMOKE_TEST else 5 for kIter in range(N_ITERATIONS): + candidates = campaign.searchspace.discrete.get_candidates() + print(f"\n\n#### ITERATION {kIter + 1} ####") print("## ASSERTS ##") print( "Number of entries with 1,2-sum above 150: ", - ( - campaign.searchspace.discrete.exp_rep[["NumParam1", "NumParam2"]].sum( - axis=1 - ) - > 150.0 - ).sum(), + (candidates[["NumParam1", "NumParam2"]].sum(axis=1) > 150.0).sum(), ) print( "Number of entries with 3,4-product under 30: ", - ( - campaign.searchspace.discrete.exp_rep[["NumParam3", "NumParam4"]].prod( - axis=1 - ) - < 30 - ).sum(), + (candidates[["NumParam3", "NumParam4"]].prod(axis=1) < 30).sum(), ) print( "Number of entries with 5,6-sum unequal to 100: ", - campaign.searchspace.discrete.exp_rep[["NumParam5", "NumParam6"]] + candidates[["NumParam5", "NumParam6"]] .sum(axis=1) .apply(lambda x: x - 100.0) .abs() diff --git a/examples/Custom_Hooks/campaign_stopping.py b/examples/Custom_Hooks/campaign_stopping.py index 208cbe60f9..1aa35fc046 100644 --- a/examples/Custom_Hooks/campaign_stopping.py +++ b/examples/Custom_Hooks/campaign_stopping.py @@ -138,7 +138,7 @@ def stop_on_PI( f"Currently, only search spaces of type '{SearchSpaceType.DISCRETE}' are " f"accepted." ) - candidates, _ = searchspace.discrete.get_candidates() + candidates = searchspace.discrete.get_candidates() acqf = ProbabilityOfImprovement() pi = self.acquisition_values( candidates, searchspace, objective, measurements, acquisition_function=acqf diff --git a/examples/Custom_Hooks/probability_of_improvement.py b/examples/Custom_Hooks/probability_of_improvement.py index 70511fffeb..4a87aba47f 100644 --- a/examples/Custom_Hooks/probability_of_improvement.py +++ b/examples/Custom_Hooks/probability_of_improvement.py @@ -79,7 +79,7 @@ def extract_pi( f"Currently, only search spaces of type '{SearchSpaceType.DISCRETE}' are " f"accepted." ) - candidates, _ = searchspace.discrete.get_candidates() + candidates = searchspace.discrete.get_candidates() acqf = ProbabilityOfImprovement() pi = self.acquisition_values( candidates, searchspace, objective, measurements, acquisition_function=acqf diff --git a/examples/Custom_Surrogates/custom_pretrained.py b/examples/Custom_Surrogates/custom_pretrained.py index 5e408fd4b4..d3cff830ed 100644 --- a/examples/Custom_Surrogates/custom_pretrained.py +++ b/examples/Custom_Surrogates/custom_pretrained.py @@ -55,7 +55,7 @@ # Its purpose is to show the workflow for using pre-trained surrogates in BayBE. searchspace = SearchSpace.from_product(parameters=parameters, constraints=None) -train_x = to_tensor(searchspace.discrete.comp_rep) +train_x = to_tensor(searchspace.transform(searchspace.discrete.get_candidates())) train_y = torch.rand(train_x.size(dim=0)) # train with a random y vector # Define model and fit diff --git a/examples/Mixtures/slot_based.py b/examples/Mixtures/slot_based.py index ee1cef9a7c..9b5d0290e9 100644 --- a/examples/Mixtures/slot_based.py +++ b/examples/Mixtures/slot_based.py @@ -185,11 +185,12 @@ space = SubspaceDiscrete.from_product(parameters=parameters, constraints=constraints) +candidates = space.get_candidates() print( pretty_print_df( - space.exp_rep, - max_rows=len(space.exp_rep), - max_columns=len(space.exp_rep.columns), + candidates, + max_rows=len(candidates), + max_columns=len(candidates.columns), ) ) @@ -222,9 +223,9 @@ # Let us programmatically assert that all constraints are satisfied: -amounts = space.exp_rep[["Slot1_Amount", "Slot2_Amount", "Slot3_Amount"]] -labels = space.exp_rep[["Slot1_Label", "Slot2_Label", "Slot3_Label"]] -slots = space.exp_rep.apply( +amounts = candidates[["Slot1_Amount", "Slot2_Amount", "Slot3_Amount"]] +labels = candidates[["Slot1_Label", "Slot2_Label", "Slot3_Label"]] +slots = candidates.apply( lambda row: pd.Series( [(row[f"Slot{k}_Label"], row[f"Slot{k}_Amount"]) for k in range(1, 4)] ), diff --git a/pyproject.toml b/pyproject.toml index e636999c91..186016502d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -32,10 +32,11 @@ dynamic = ['version'] dependencies = [ "attrs>=26.1.0", "botorch>=0.13.0,<1", - "cattrs>=25.2.0", + "cattrs>=26.1.0", "exceptiongroup", "gpytorch>=1.9.1,<2", "joblib>1.4.0,<2", + "narwhals>=2,<3", "numpy>=1.24.1,<3", "pandas>=2.1.0,<4", "scikit-learn>=1.1.1,<2", @@ -166,6 +167,7 @@ benchmarking = [ ] test = [ + "baybe[polars]", "hypothesis[pandas]>=6.88.4", "tenacity>=8.5.0", "pytest>=7.2.0", diff --git a/tests/constraints/test_batch_constraint.py b/tests/constraints/test_batch_constraint.py index a3c4bf0c23..0252a8a203 100644 --- a/tests/constraints/test_batch_constraint.py +++ b/tests/constraints/test_batch_constraint.py @@ -150,9 +150,5 @@ def test_subset_masks_min_candidates(min_candidates, expected_count, constraint) if constraint is not None: constraints.append(constraint) searchspace = SearchSpace.from_product(_params, constraints) - masks = list( - searchspace.discrete.subset_masks( - searchspace.discrete.exp_rep, min_candidates=min_candidates - ) - ) + masks = list(searchspace.discrete.subset_masks(min_candidates=min_candidates)) assert len(masks) == expected_count diff --git a/tests/constraints/test_cardinality_constraint_continuous.py b/tests/constraints/test_cardinality_constraint_continuous.py index 04e9e10cf9..20d86731f9 100644 --- a/tests/constraints/test_cardinality_constraint_continuous.py +++ b/tests/constraints/test_cardinality_constraint_continuous.py @@ -104,9 +104,7 @@ def test_sampling_cardinality_constraint(cardinality_bounds: tuple[int, int]): ), ) - subspace_continous = SubspaceContinuous( - parameters=parameters, constraints_nonlin=constraints - ) + subspace_continous = SubspaceContinuous(parameters, constraints) with warnings.catch_warnings(record=True) as w: samples = subspace_continous.sample_uniform(BATCH_SIZE) diff --git a/tests/constraints/test_cardinality_constraint_discrete.py b/tests/constraints/test_cardinality_constraint_discrete.py index 1da1a92ea8..bc5d56fdca 100644 --- a/tests/constraints/test_cardinality_constraint_discrete.py +++ b/tests/constraints/test_cardinality_constraint_discrete.py @@ -46,7 +46,7 @@ def test_cardinality_constraint_discrete( # Assert that cardinality constraint is fulfilled assert ( - (searchspace.discrete.exp_rep != 0.0) + (searchspace.discrete.get_candidates() != 0.0) .sum(axis=1) .between(min_cardinality, max_cardinality) .all() diff --git a/tests/constraints/test_constraints_continuous.py b/tests/constraints/test_constraints_continuous.py index 2b5c91e112..0655827ee6 100644 --- a/tests/constraints/test_constraints_continuous.py +++ b/tests/constraints/test_constraints_continuous.py @@ -7,6 +7,7 @@ from baybe.constraints import ContinuousLinearConstraint from baybe.parameters.numerical import NumericalContinuousParameter +from baybe.utils.dataframe import add_fake_measurements from tests.conftest import run_iterations TOLERANCE = 0.01 @@ -116,18 +117,22 @@ def test_interpoint_linear_constraints( check_type, ): """Test interpoint linear constraints with various operators and parameters.""" - run_iterations(campaign_non_sequential, n_iterations, batch_size, add_noise=False) - res = campaign_non_sequential.measurements + batches = [] + for _ in range(n_iterations): + rec = campaign_non_sequential.recommend(batch_size=batch_size) + batches.append(rec) + add_fake_measurements(rec, campaign_non_sequential.targets) + campaign_non_sequential.add_measurements(rec) - res_grouped = res.groupby("BatchNr") - interpoint_result = calculation(res_grouped) + for batch in batches: + interpoint_result = calculation(batch) - if check_type == "eq": - assert np.allclose(interpoint_result, expected_value, atol=TOLERANCE) - elif check_type == "ge": - assert interpoint_result.ge(expected_value - TOLERANCE).all() - elif check_type == "le": - assert interpoint_result.le(expected_value + TOLERANCE).all() + if check_type == "eq": + assert np.isclose(interpoint_result, expected_value, atol=TOLERANCE) + elif check_type == "ge": + assert interpoint_result >= expected_value - TOLERANCE + elif check_type == "le": + assert interpoint_result <= expected_value + TOLERANCE @pytest.mark.parametrize("parameter_names", [["Conti_finite1", "Conti_finite2"]]) @@ -140,11 +145,20 @@ def test_interpoint_intrapoint_mix( batch_size, ): """Test mixing interpoint and intrapoint inequality constraints.""" - run_iterations(campaign_non_sequential, n_iterations, batch_size, add_noise=False) + batches = [] + for _ in range(n_iterations): + rec = campaign_non_sequential.recommend(batch_size=batch_size) + batches.append(rec) + add_fake_measurements(rec, campaign_non_sequential.targets) + campaign_non_sequential.add_measurements(rec) + + # Interpoint constraint: sum across each batch + for batch in batches: + interpoint_result = 2 * batch["Conti_finite1"].sum() + assert interpoint_result >= 0.3 - TOLERANCE + + # Intrapoint constraint: each row individually res = campaign_non_sequential.measurements - - interpoint_result = 2 * res.groupby("BatchNr")["Conti_finite1"].sum() - assert interpoint_result.ge(0.3 - TOLERANCE).all() assert ( (1.0 * res["Conti_finite1"] + 3.0 * res["Conti_finite2"]) .ge(0.3 - TOLERANCE) diff --git a/tests/constraints/test_constraints_discrete.py b/tests/constraints/test_constraints_discrete.py index 9273ae13bf..a0add6ed26 100644 --- a/tests/constraints/test_constraints_discrete.py +++ b/tests/constraints/test_constraints_discrete.py @@ -27,10 +27,11 @@ def fixture_n_grid_points(request): @pytest.mark.parametrize("constraint_names", [["Constraint_1"]]) def test_simple_dependency(campaign, n_grid_points, mock_substances, mock_categories): """Test declaring dependencies by declaring them in a single constraints entry.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number entries with both switches on num_entries = ( - (campaign.searchspace.discrete.exp_rep["Switch_1"] == "on") - & (campaign.searchspace.discrete.exp_rep["Switch_2"] == "right") + (candidates["Switch_1"] == "on") & (candidates["Switch_2"] == "right") ).sum() assert num_entries == n_grid_points * len(mock_substances) * len( mock_categories @@ -38,22 +39,19 @@ def test_simple_dependency(campaign, n_grid_points, mock_substances, mock_catego # Number entries with Switch_1 off num_entries = ( - (campaign.searchspace.discrete.exp_rep["Switch_1"] == "off") - & (campaign.searchspace.discrete.exp_rep["Switch_2"] == "right") + (candidates["Switch_1"] == "off") & (candidates["Switch_2"] == "right") ).sum() assert num_entries == len(mock_categories) * len(mock_categories) # Number entries with both switches on num_entries = ( - (campaign.searchspace.discrete.exp_rep["Switch_1"] == "on") - & (campaign.searchspace.discrete.exp_rep["Switch_2"] == "left") + (candidates["Switch_1"] == "on") & (candidates["Switch_2"] == "left") ).sum() assert num_entries == n_grid_points * len(mock_substances) # Number entries with both switches on num_entries = ( - (campaign.searchspace.discrete.exp_rep["Switch_1"] == "off") - & (campaign.searchspace.discrete.exp_rep["Switch_2"] == "left") + (candidates["Switch_1"] == "off") & (candidates["Switch_2"] == "left") ).sum() assert num_entries == 1 @@ -67,28 +65,26 @@ def test_simple_dependency(campaign, n_grid_points, mock_substances, mock_catego ) def test_exclusion(campaign, mock_substances): """Tests exclusion constraint.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number of entries with either first/second substance and a temperature above 151 num_entries = ( - campaign.searchspace.discrete.exp_rep["Temperature"].apply(lambda x: x > 151) - & campaign.searchspace.discrete.exp_rep["Solvent_1"].apply( - lambda x: x in list(mock_substances)[:2] - ) + candidates["Temperature"].apply(lambda x: x > 151) + & candidates["Solvent_1"].apply(lambda x: x in list(mock_substances)[:2]) ).sum() assert num_entries == 0 # Number of entries with either last / second last substance and a pressure above 5 num_entries = ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x > 5) - & campaign.searchspace.discrete.exp_rep["Solvent_1"].apply( - lambda x: x in list(mock_substances)[-2:] - ) + candidates["Pressure"].apply(lambda x: x > 5) + & candidates["Solvent_1"].apply(lambda x: x in list(mock_substances)[-2:]) ).sum() assert num_entries == 0 # Number of entries with pressure below 3 and temperature above 120 num_entries = ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x < 3) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply(lambda x: x > 120) + candidates["Pressure"].apply(lambda x: x < 3) + & candidates["Temperature"].apply(lambda x: x > 120) ).sum() assert num_entries == 0 @@ -97,11 +93,10 @@ def test_exclusion(campaign, mock_substances): @pytest.mark.parametrize("constraint_names", [["Constraint_8"]]) def test_prodsum1(campaign): """Tests sum constraint.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number of entries with 1,2-sum above 150 - num_entries = ( - campaign.searchspace.discrete.exp_rep[["Fraction_1", "Fraction_2"]].sum(axis=1) - > 150.0 - ).sum() + num_entries = (candidates[["Fraction_1", "Fraction_2"]].sum(axis=1) > 150.0).sum() assert num_entries == 0 @@ -109,11 +104,10 @@ def test_prodsum1(campaign): @pytest.mark.parametrize("constraint_names", [["Constraint_9"]]) def test_prodsum2(campaign): """Tests product constrain.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number of entries with product under 30 - num_entries = ( - campaign.searchspace.discrete.exp_rep[["Fraction_1", "Fraction_2"]].prod(axis=1) - < 30 - ).sum() + num_entries = (candidates[["Fraction_1", "Fraction_2"]].prod(axis=1) < 30).sum() assert num_entries == 0 @@ -121,9 +115,10 @@ def test_prodsum2(campaign): @pytest.mark.parametrize("constraint_names", [["Constraint_10"]]) def test_prodsum3(campaign): """Tests exact sum constraint.""" + candidates = campaign.searchspace.discrete.get_candidates() # Number of entries with sum unequal to 100 num_entries = ( - campaign.searchspace.discrete.exp_rep[["Fraction_1", "Fraction_2"]] + candidates[["Fraction_1", "Fraction_2"]] .sum(axis=1) .apply(lambda x: x - 100.0) .abs() @@ -142,11 +137,11 @@ def test_prodsum3(campaign): ) def test_mixture(campaign, n_grid_points, mock_substances): """Tests various constraints in a mixture use case.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number of searchspace entries where fractions do not sum to 100.0 num_entries = ( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] + candidates[["Fraction_1", "Fraction_2", "Fraction_3"]] .sum(axis=1) .apply(lambda x: x - 100.0) .abs() @@ -157,23 +152,16 @@ def test_mixture(campaign, n_grid_points, mock_substances): # Number of searchspace entries that have duplicate solvent labels num_entries = ( - campaign.searchspace.discrete.exp_rep[["Solvent_1", "Solvent_2", "Solvent_3"]] - .nunique(axis=1) - .ne(3) - .sum() + candidates[["Solvent_1", "Solvent_2", "Solvent_3"]].nunique(axis=1).ne(3).sum() ) assert num_entries == 0 # Number of searchspace entries with permutation-invariant combinations num_entries = ( - campaign.searchspace.discrete.exp_rep[["Solvent_1", "Solvent_2", "Solvent_3"]] + candidates[["Solvent_1", "Solvent_2", "Solvent_3"]] .apply(frozenset, axis=1) .to_frame() - .join( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] - ) + .join(candidates[["Fraction_1", "Fraction_2", "Fraction_3"]]) .duplicated() .sum() ) @@ -181,12 +169,7 @@ def test_mixture(campaign, n_grid_points, mock_substances): # Number of unique 1-solvent entries num_entries = ( - ( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] - == 0.0 - ) + (candidates[["Fraction_1", "Fraction_2", "Fraction_3"]] == 0.0) .sum(axis=1) .eq(2) .sum() @@ -195,12 +178,7 @@ def test_mixture(campaign, n_grid_points, mock_substances): # Number of unique 2-solvent entries num_entries = ( - ( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] - == 0.0 - ) + (candidates[["Fraction_1", "Fraction_2", "Fraction_3"]] == 0.0) .sum(axis=1) .eq(1) .sum() @@ -209,12 +187,7 @@ def test_mixture(campaign, n_grid_points, mock_substances): # Number of unique 3-solvent entries num_entries = ( - ( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] - == 0.0 - ) + (candidates[["Fraction_1", "Fraction_2", "Fraction_3"]] == 0.0) .sum(axis=1) .eq(0) .sum() @@ -234,24 +207,26 @@ def test_mixture(campaign, n_grid_points, mock_substances): @pytest.mark.parametrize("constraint_names", [["Constraint_13"]]) def test_custom(campaign): """Tests custom constraint (uses config from exclude test).""" + candidates = campaign.searchspace.discrete.get_candidates() + num_entries = ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x > 5) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply(lambda x: x > 120) - & campaign.searchspace.discrete.exp_rep["Solvent_1"].eq("water") + candidates["Pressure"].apply(lambda x: x > 5) + & candidates["Temperature"].apply(lambda x: x > 120) + & candidates["Solvent_1"].eq("water") ).sum() assert num_entries == 0 ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x > 3) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply(lambda x: x > 180) - & campaign.searchspace.discrete.exp_rep["Solvent_1"].eq("C2") + candidates["Pressure"].apply(lambda x: x > 3) + & candidates["Temperature"].apply(lambda x: x > 180) + & candidates["Solvent_1"].eq("C2") ).sum() assert num_entries == 0 ( - campaign.searchspace.discrete.exp_rep["Pressure"].apply(lambda x: x > 3) - & campaign.searchspace.discrete.exp_rep["Temperature"].apply(lambda x: x < 150) - & campaign.searchspace.discrete.exp_rep["Solvent_1"].eq("C3") + candidates["Pressure"].apply(lambda x: x > 3) + & candidates["Temperature"].apply(lambda x: x < 150) + & candidates["Solvent_1"].eq("C3") ).sum() assert num_entries == 0 @@ -263,13 +238,12 @@ def test_custom(campaign): @pytest.mark.parametrize("constraint_names", [["Constraint_14"]]) def test_cardinality(campaign): """Test discrete cardinality constraint.""" + candidates = campaign.searchspace.discrete.get_candidates() + # Number of non-zeros - non_zeros = ( - campaign.searchspace.discrete.exp_rep[ - ["Fraction_1", "Fraction_2", "Fraction_3"] - ] - != 0.0 - ).sum(axis=1) + non_zeros = (candidates[["Fraction_1", "Fraction_2", "Fraction_3"]] != 0.0).sum( + axis=1 + ) # number of non-zeros fulfills cardinality min_cardinality = 1 diff --git a/tests/hypothesis_strategies/alternative_creation/test_searchspace.py b/tests/hypothesis_strategies/alternative_creation/test_searchspace.py index 662e898134..506ef19c29 100644 --- a/tests/hypothesis_strategies/alternative_creation/test_searchspace.py +++ b/tests/hypothesis_strategies/alternative_creation/test_searchspace.py @@ -13,7 +13,6 @@ NumericalContinuousParameter, NumericalDiscreteParameter, ) -from baybe.parameters.categorical import TaskParameter from baybe.searchspace import SearchSpace, SubspaceContinuous from baybe.searchspace.discrete import SubspaceDiscrete from tests.hypothesis_strategies.parameters import numerical_discrete_parameters @@ -109,7 +108,7 @@ def test_discrete_searchspace_creation_from_degenerate_dataframe(): """A degenerate dataframe with index but no columns yields an empty space.""" df = pd.DataFrame(index=[0]) subspace = SubspaceDiscrete.from_dataframe(df) - assert_frame_equal(subspace.exp_rep, pd.DataFrame()) + assert_frame_equal(subspace.get_candidates(), pd.DataFrame()) @pytest.mark.parametrize("boundary_only", (False, True)) @@ -140,22 +139,23 @@ def test_discrete_space_creation_from_simplex_inner(parameters, boundary_only): max_sum, parameters, boundary_only=boundary_only, tolerance=tolerance ) + candidates = subspace.get_candidates() if boundary_only: - assert np.allclose(subspace.exp_rep.sum(axis=1), max_sum, atol=tolerance) + assert np.allclose(candidates.sum(axis=1), max_sum, atol=tolerance) else: - assert (subspace.exp_rep.sum(axis=1) <= max_sum + tolerance).all() + assert (candidates.sum(axis=1) <= max_sum + tolerance).all() p_d1 = NumericalDiscreteParameter(name="d1", values=[0.0, 0.5, 1.0]) p_d2 = NumericalDiscreteParameter(name="d2", values=[0.0, 0.5, 1.0]) -p_t1 = TaskParameter(name="t1", values=["A", "B"]) -p_t2 = TaskParameter(name="t2", values=["A", "B"]) +p_c1 = CategoricalParameter(name="c1", values=["A", "B"]) +p_c2 = CategoricalParameter(name="c2", values=["A", "B"]) @pytest.mark.parametrize( ("simplex_parameters", "product_parameters", "n_elements"), [ - param([p_d1, p_d2], [p_t1, p_t2], 6 * 4, id="both"), + param([p_d1, p_d2], [p_c1, p_c2], 6 * 4, id="both"), param([p_d1, p_d2], [], 6, id="simplex-only"), ], ) @@ -170,10 +170,11 @@ def test_discrete_space_creation_from_simplex_mixed( product_parameters=product_parameters, boundary_only=False, ) - assert len(subspace.exp_rep) == n_elements # <-- (# simplex part) x (# task part) - assert not any(subspace.exp_rep.duplicated()) - assert len(subspace.parameters) == len(subspace.exp_rep.columns) - assert all(p.name in subspace.exp_rep.columns for p in subspace.parameters) + candidates = subspace.get_candidates() + assert len(candidates) == n_elements # <-- (# simplex part) x (# task part) + assert not any(candidates.duplicated()) + assert len(subspace.parameters) == len(candidates.columns) + assert all(p.name in candidates.columns for p in subspace.parameters) @pytest.mark.parametrize("boundary_only", (False, True)) @@ -189,10 +190,11 @@ def test_discrete_space_creation_from_simplex_restricted(boundary_only): max_nonzero=4, boundary_only=True, ) - n_nonzero = (subspace.exp_rep > 0.0).sum(axis=1) + candidates = subspace.get_candidates() + n_nonzero = (candidates > 0.0).sum(axis=1) if boundary_only: - assert np.allclose(subspace.exp_rep.sum(axis=1), 1.0) + assert np.allclose(candidates.sum(axis=1), 1.0) assert n_nonzero.min() == 2 assert n_nonzero.max() == 4 - assert len(subspace.parameters) == len(subspace.exp_rep.columns) - assert all(p.name in subspace.exp_rep.columns for p in subspace.parameters) + assert len(subspace.parameters) == len(candidates.columns) + assert all(p.name in candidates.columns for p in subspace.parameters) diff --git a/tests/test_campaign.py b/tests/test_campaign.py index c8bd052c8a..1cb15fc460 100644 --- a/tests/test_campaign.py +++ b/tests/test_campaign.py @@ -11,7 +11,7 @@ from pytest import param from baybe.acquisition import qLogEI, qLogNEHVI, qTS, qUCB -from baybe.campaign import _EXCLUDED, Campaign +from baybe.campaign import Campaign from baybe.constraints.conditions import SubSelectionCondition from baybe.constraints.discrete import DiscreteExcludeConstraint from baybe.exceptions import IncompatibilityError, NotEnoughPointsLeftError @@ -91,7 +91,7 @@ def test_get_surrogate(campaign, n_iterations, batch_size): ids=["dataframe", "constraints"], ) def test_candidate_toggling(constraints, exclude, complement): - """Toggling discrete candidates updates the campaign metadata accordingly.""" + """Toggling discrete candidates updates the exclusion state accordingly.""" subspace = SubspaceDiscrete.from_product( [ NumericalDiscreteParameter("a", [0, 1]), @@ -99,22 +99,33 @@ def test_candidate_toggling(constraints, exclude, complement): ] ) campaign = Campaign(subspace) + all_candidates = campaign.searchspace.discrete.get_candidates() - # Set metadata to opposite of targeted value so that we can verify the effect later - campaign._searchspace_metadata[_EXCLUDED] = not exclude + # Set initial state to the opposite of the targeted value + if not exclude: + # Pre-exclude all candidates so that we can test re-inclusion + campaign.toggle_discrete_candidates(all_candidates, exclude=True) # Toggle the candidates campaign.toggle_discrete_candidates(constraints, exclude, complement=complement) - # Extract row indices of candidates whose metadata should have been toggled - matches = campaign.searchspace.discrete.exp_rep["a"] == 0 - idx = matches.index[~matches if complement else matches] + # Determine which candidates should be affected by the toggle + matches = all_candidates["a"] == 0 + toggled_idx = matches.index[~matches if complement else matches] + toggled_rows = all_candidates.loc[toggled_idx] - # Assert that metadata is set correctly - target = campaign._searchspace_metadata.loc[idx, _EXCLUDED] - other = campaign._searchspace_metadata[_EXCLUDED].drop(index=idx) - assert all(target == exclude) # must contain the updated values - assert all(other != exclude) # must contain the original values + if exclude: + # The toggled rows should be excluded, the rest should not + assert len(campaign._excluded_experiments) == len(toggled_rows) + merged = pd.merge(campaign._excluded_experiments, toggled_rows, how="inner") + assert len(merged) == len(toggled_rows) + else: + # The toggled rows should be re-included, the rest should remain excluded + expected_remaining = len(all_candidates) - len(toggled_rows) + assert len(campaign._excluded_experiments) == expected_remaining + + # Assert that recommendation with toggled candidates still works + campaign.recommend(1) @pytest.mark.parametrize( @@ -276,7 +287,6 @@ def test_update_measurements(ongoing_campaign): meas = ongoing_campaign.measurements assert meas.iloc[0, updated.columns.get_loc(p_name)] == 1337 assert meas.iloc[0, updated.columns.get_loc(t_name)] == 1337 - assert meas.iloc[[0], updated.columns.get_loc("FitNr")].isna().all() assert ongoing_campaign._cached_recommendation is None @@ -400,7 +410,7 @@ def test_posterior_stats_invalid_input(ongoing_campaign, stats, error, match): @pytest.mark.parametrize("batch_size", [3], ids=["b3"]) def test_acquisition_value_computation(ongoing_campaign: Campaign): """Acquisition values have the expected shape.""" - df = ongoing_campaign.searchspace.discrete.exp_rep + df = ongoing_campaign.searchspace.discrete.get_candidates() assert not df.empty # Using campaign acquisition function diff --git a/tests/test_candidates.py b/tests/test_candidates.py new file mode 100644 index 0000000000..3e6a71e98a --- /dev/null +++ b/tests/test_candidates.py @@ -0,0 +1,125 @@ +"""Tests for candidate generators.""" + +import narwhals as nw +import pandas as pd +import polars as pl +import pytest +from pandas.testing import assert_frame_equal + +from baybe.constraints import DiscreteSumConstraint, ThresholdCondition +from baybe.constraints.conditions import SubSelectionCondition +from baybe.constraints.discrete import DiscreteExcludeConstraint +from baybe.parameters import ( + CategoricalParameter, + NumericalContinuousParameter, + NumericalDiscreteParameter, +) +from baybe.searchspace.candidates import ProductCandidates, TableCandidates +from baybe.utils.dataframe import create_fake_input + +p_disc = NumericalDiscreteParameter("disc", (1, 2)) +p_disc2 = NumericalDiscreteParameter("disc2", (0, 10)) +p_cat = CategoricalParameter("cat", ("a", "b", "c")) +p_cont = NumericalContinuousParameter("cont", (3, 8)) +c_sum = DiscreteSumConstraint(["disc", "disc2"], ThresholdCondition(2, "<=")) +c_sub = DiscreteExcludeConstraint(["disc"], [SubSelectionCondition([1])]) +edf = pd.DataFrame() + + +@pytest.mark.parametrize( + "dataframe_factory", + [ + pytest.param(lambda pd_df: pd_df, id="pandas_eager"), + pytest.param(pl.DataFrame, id="polars_eager"), + pytest.param(pl.LazyFrame, id="polars_lazy"), + pytest.param(lambda x: nw.from_native(x, eager_only=True), id="narwhals_eager"), + pytest.param(lambda x: nw.from_native(x).lazy(), id="narwhals_lazy"), + ], +) +def test_table_candidates_generation(dataframe_factory): + """TableCandidates generates the expected lazy dataframe.""" + parameters = [p_disc, p_cat] + data = create_fake_input(parameters, [], n_rows=4) + df = dataframe_factory(data) + candidates = TableCandidates(parameters, df) + candidates_ldf = candidates.to_lazy() + candidates_df = candidates_ldf.collect() + + assert candidates.is_finite + assert isinstance(candidates_ldf, nw.LazyFrame) + assert set(candidates_df.columns) == {p.name for p in parameters} + assert candidates_df.shape == data.shape + assert_frame_equal(candidates_df.to_pandas(), data) + + +@pytest.mark.parametrize( + ("parameters", "dataframe", "error"), + [ + pytest.param([], edf, ValueError(">= 1"), id="empty_param"), + pytest.param(None, edf, TypeError("not iterable"), id="none_param"), + pytest.param([p_cont], edf, TypeError("be = 1"), id="empty_param"), + pytest.param(None, (), TypeError("not iterable"), id="none_param"), + pytest.param([p_cont], (), TypeError("be =", [1], 0) + c_nonlin = ContinuousCardinalityConstraint(["p"], 1) + + with pytest.warns(DeprecationWarning): + if positional: + subspace = SubspaceContinuous( + parameters=(p,), + constraints=(c,), + constraints_lin_eq=(c_lin_eq,), + constraints_lin_ineq=(c_lin_ineq,), + constraints_nonlin=(c_nonlin,), + ) + else: + subspace = SubspaceContinuous( + (p,), + (c, c_lin_eq), + (c_lin_ineq,), + (c_nonlin,), + ) + + assert c in subspace.constraints + assert c_lin_eq in subspace.constraints + assert c_lin_ineq in subspace.constraints + assert c_nonlin in subspace.constraints + + +def test_deprecated_constraints_arguments_deserialization(): + """Deserialization from legacy JSON with deprecated constraint attributes works.""" + p1 = NumericalContinuousParameter("p", (0, 1)) + c_lin_eq = ContinuousLinearConstraint(["p"], "=", [1], 1) + c_lin_ineq = ContinuousLinearConstraint(["p"], ">=", [1], 0) + c_nonlin = ContinuousCardinalityConstraint(["p"], 1) + + # Construct the expected object using the modern interface + expected = SubspaceContinuous( + parameters=(p1,), + constraints=(c_lin_eq, c_lin_ineq, c_nonlin), + ) + + # Build a legacy dict with the deprecated constraint field names + legacy_dict = { + "type": "SubspaceContinuous", + "parameters": [p1.to_dict()], + "constraints_lin_eq": [c_lin_eq.to_dict()], + "constraints_lin_ineq": [c_lin_ineq.to_dict()], + "constraints_nonlin": [c_nonlin.to_dict()], + } + + with pytest.warns(DeprecationWarning): + actual = SubspaceContinuous.from_dict(legacy_dict) + + assert actual == expected + + +@pytest.mark.parametrize( + ("arg", "error"), [("empty_encoding", False), ("comp_rep", True)] +) +def test_deprecated_subspace_discrete_arguments(arg, error): + """Providing deprecated arguments to `SubspaceDiscrete` raises an error / a warning.""" # noqa + context = ( + pytest.raises(DeprecationError, match=f"Providing '{arg}'") + if error + else pytest.warns(DeprecationWarning, match=f"Providing '{arg}'") + ) + with context: + SubspaceDiscrete(parameters=[], exp_rep=pd.DataFrame(), **{arg: 0}) + + +def test_deprecated_empty_encoding_from_product(): + """Passing `empty_encoding` to `SubspaceDiscrete.from_product` raises a warning.""" # noqa + with pytest.warns(DeprecationWarning, match="Providing 'empty_encoding'"): + SubspaceDiscrete.from_product( + parameters=[NumericalDiscreteParameter("p", [0, 1])], + empty_encoding=True, + ) + + +def test_deprecated_empty_encoding_from_dataframe(): + """Passing `empty_encoding` to `SubspaceDiscrete.from_dataframe` raises a warning.""" # noqa + with pytest.warns(DeprecationWarning, match="Providing 'empty_encoding'"): + SubspaceDiscrete.from_dataframe( + parameters=[NumericalDiscreteParameter("p", [0, 1])], + df=pd.DataFrame({"p": [0, 1]}), + empty_encoding=True, + ) + + +def test_deprecated_discrete_subspace_deserialization(): + """Deserialization from legacy JSON with `empty_encoding`/`comp_rep` works.""" + p = NumericalDiscreteParameter("p", [0, 1]) + expected = SubspaceDiscrete.from_product(parameters=[p]) + + # Build a legacy dict containing the deprecated fields + legacy_dict = expected.to_dict() + legacy_dict["empty_encoding"] = False + legacy_dict["comp_rep"] = legacy_dict["exp_rep"] + + actual = SubspaceDiscrete.from_dict(legacy_dict) + assert actual == expected + + +def test_deprecated_constraints_deserialization(): + """Deserialization of legacy ``constraints`` key migrates batch constraints.""" + p = NumericalDiscreteParameter("p", [0, 1, 2]) + batch_c = DiscreteBatchConstraint(["p"]) + expected = SubspaceDiscrete.from_product(parameters=[p], constraints=[batch_c]) + + # Simulate a legacy dict with `constraints` instead of `batch_constraints` + legacy_dict = expected.to_dict() + legacy_dict["constraints"] = [batch_c.to_dict()] + del legacy_dict["batch_constraints"] + + with pytest.warns(DeprecationWarning, match="Providing 'constraints'"): + actual = SubspaceDiscrete.from_dict(legacy_dict) + + assert actual == expected + + +def test_deprecated_constraints_argument(): + """Passing `constraints` to `SubspaceDiscrete` raises a deprecation warning.""" + p = NumericalDiscreteParameter("p", [0, 1, 2]) + batch_c = DiscreteBatchConstraint(["p"]) + with pytest.warns(DeprecationWarning, match="Providing 'constraints'"): + subspace = SubspaceDiscrete( + parameters=[p], + exp_rep=pd.DataFrame({"p": [0, 1, 2]}), + constraints=[batch_c], + ) + # The batch constraint must be migrated to `batch_constraints` + assert subspace.batch_constraints == (batch_c,) + + +def test_deprecated_constraints_argument_from_product(): + """Passing mixed constraints to ``from_product`` routes batch constraints correctly.""" # noqa: E501 + p = CategoricalParameter("p", ["a", "b"]) + q = CategoricalParameter("q", ["x", "y"]) + batch_c = DiscreteBatchConstraint(["p"]) + no_dup_c = DiscreteExcludeConstraint(["p"], [SubSelectionCondition(["a"])]) + + ss_both = SubspaceDiscrete.from_product( + parameters=[p, q], constraints=[batch_c, no_dup_c] + ) + ss_none = SubspaceDiscrete.from_product(parameters=[p, q], constraints=[]) + ss_with_batch = SubspaceDiscrete.from_product( + parameters=[p, q], constraints=[batch_c] + ) + ss_without_batch = SubspaceDiscrete.from_product( + parameters=[p, q], constraints=[no_dup_c] + ) + + ss_both_candidates = ss_both.get_candidates() + ss_none_candidates = ss_none.get_candidates() + ss_with_batch_candidates = ss_with_batch.get_candidates() + ss_without_batch_candidates = ss_without_batch.get_candidates() + assert ss_both.batch_constraints == ss_with_batch.batch_constraints == (batch_c,) + assert ss_without_batch.batch_constraints == ss_none.batch_constraints == () + assert_frame_equal(ss_both_candidates, ss_without_batch_candidates) + assert_frame_equal(ss_with_batch_candidates, ss_none_candidates) + assert len(ss_both_candidates) == 2 + assert len(ss_none_candidates) == 4 + + +def test_deprecated_constraints_batch_property(): + """Accessing ``constraints_batch`` emits a deprecation warning and delegates correctly.""" # noqa: E501 + p = NumericalDiscreteParameter("p", [0, 1, 2]) + batch_c = DiscreteBatchConstraint(["p"]) + subspace = SubspaceDiscrete( + parameters=[p], + exp_rep=pd.DataFrame({"p": [0, 1, 2]}), + batch_constraints=(batch_c,), + ) + + with pytest.warns(DeprecationWarning, match="constraints_batch"): + result = subspace.constraints_batch + + assert result == subspace.batch_constraints == (batch_c,) + + +def test_deprecated_exp_rep_property(): + """Accessing ``exp_rep`` on ``SubspaceDiscrete`` emits a deprecation warning.""" + subspace = CategoricalParameter("p", ["a", "b"]).to_subspace() + with pytest.warns(DeprecationWarning, match="Accessing 'exp_rep'"): + result = subspace.exp_rep + assert_frame_equal(result, subspace.get_candidates()) + + +def test_deprecated_comp_rep_property(): + """Accessing ``comp_rep`` on ``SubspaceDiscrete`` emits a deprecation warning.""" + subspace = CategoricalParameter("p", ["a", "b"]).to_subspace() + with pytest.warns(DeprecationWarning, match="Accessing 'comp_rep'"): + result = subspace.comp_rep + assert_frame_equal(result, subspace.transform(subspace.get_candidates())) diff --git a/tests/test_searchspace.py b/tests/test_searchspace.py index 0579a82d44..660a4a0a6f 100644 --- a/tests/test_searchspace.py +++ b/tests/test_searchspace.py @@ -7,10 +7,7 @@ from baybe._optional.info import POLARS_INSTALLED from baybe.constraints import ( - ContinuousCardinalityConstraint, ContinuousLinearConstraint, - DiscreteSumConstraint, - ThresholdCondition, ) from baybe.exceptions import ( EmptySearchSpaceError, @@ -71,9 +68,8 @@ def test_empty_parameter_bounds(): Also checks for the correct shapes. """ - parameters = [] - searchspace_discrete = SubspaceDiscrete.from_product(parameters=parameters) - searchspace_continuous = SubspaceContinuous(parameters=parameters) + searchspace_discrete = SubspaceDiscrete.empty() + searchspace_continuous = SubspaceContinuous.empty() expected = pd.DataFrame(np.empty((2, 0)), index=["min", "max"]) pd.testing.assert_frame_equal(searchspace_discrete.comp_rep_bounds, expected) pd.testing.assert_frame_equal(searchspace_continuous.comp_rep_bounds, expected) @@ -99,7 +95,7 @@ def test_discrete_searchspace_creation_from_dataframe(): assert searchspace.type == SearchSpaceType.DISCRETE assert searchspace.parameters == all_params - assert df.equals(searchspace.discrete.exp_rep) + assert df.equals(searchspace.discrete.get_candidates()) def test_discrete_from_dataframe_dtype_consistency(): @@ -118,7 +114,7 @@ def test_discrete_from_dataframe_dtype_consistency(): next(p for p in subspace.parameters if p.name == "C"), NumericalDiscreteParameter, ) - assert pd.api.types.is_float_dtype(subspace.exp_rep["C"]) + assert pd.api.types.is_float_dtype(subspace.get_candidates()["C"]) def test_invalid_simplex_creating_with_overlapping_parameters(): @@ -161,11 +157,12 @@ def test_from_simplex_with_degenerate_parameter_count(simplex_parameters, expect product_parameters=product_parameters, ) - assert len(subspace.exp_rep) == expected_len + candidates = subspace.get_candidates() + assert len(candidates) == expected_len if simplex_parameters: simplex_cols = [p.name for p in simplex_parameters] - assert all(subspace.exp_rep[simplex_cols].sum(axis=1) <= 1.0) + assert all(candidates[simplex_cols].sum(axis=1) <= 1.0) def test_continuous_searchspace_creation_from_bounds(): @@ -203,77 +200,6 @@ def test_hyperrectangle_searchspace_creation(): assert searchspace.parameters == parameters -def test_invalid_constraint_parameter_combos(): - """Testing invalid constraint-parameter combinations.""" - parameters = [ - CategoricalParameter("cat1", values=("c1", "c2")), - NumericalDiscreteParameter("d1", values=[1, 2, 3]), - NumericalDiscreteParameter("d2", values=[0, 1, 2]), - NumericalContinuousParameter("c1", (0, 2)), - NumericalContinuousParameter("c2", (-1, 1)), - ] - - # Attempting continuous constraint over hybrid parameter set - with pytest.raises(ValueError): - SearchSpace.from_product( - parameters=parameters, - constraints=[ContinuousLinearConstraint(["c1", "c2", "d1"], "=")], - ) - - # Attempting continuous constraint over hybrid parameter set - with pytest.raises(ValueError): - SearchSpace.from_product( - parameters=parameters, - constraints=[ContinuousLinearConstraint(["c1", "c2", "d1"], "=")], - ) - - # Attempting discrete constraint over hybrid parameter set - with pytest.raises(ValueError): - SearchSpace.from_product( - parameters=parameters, - constraints=[ - DiscreteSumConstraint( - parameters=["d1", "d2", "c1"], - condition=ThresholdCondition(threshold=1.0, operator=">"), - ) - ], - ) - - # Attempting constraints over parameter set where a parameter does not exist - with pytest.raises(ValueError): - SearchSpace.from_product( - parameters=parameters, - constraints=[ - DiscreteSumConstraint( - parameters=["d1", "e7", "c1"], - condition=ThresholdCondition(threshold=1.0, operator=">"), - ) - ], - ) - - # Attempting constraints over parameter set where a parameter does not exist - with pytest.raises(ValueError): - SearchSpace.from_product( - parameters=parameters, - constraints=[ContinuousLinearConstraint(["c1", "e7", "d1"], "=")], - ) - - # Attempting constraints over parameter sets containing non-numerical discrete - # parameters. - with pytest.raises( - ValueError, match="valid only for numerical discrete parameters" - ): - SearchSpace.from_product( - parameters=parameters, - constraints=[ - DiscreteSumConstraint( - parameters=["cat1", "d1", "d2"], - condition=ThresholdCondition(threshold=1.0, operator=">"), - ) - ], - ) - - @pytest.mark.parametrize( "parameter_names", [ @@ -307,10 +233,10 @@ def test_searchspace_memory_estimate(searchspace: SearchSpace): estimate_exp = estimate.exp_rep_bytes estimate_comp = estimate.comp_rep_bytes - actual_exp = searchspace.discrete.exp_rep.memory_usage(deep=True, index=False).sum() - actual_comp = searchspace.discrete.comp_rep.memory_usage( - deep=True, index=False - ).sum() + candidates = searchspace.discrete.get_candidates() + candidates_comp = searchspace.discrete.transform(candidates) + actual_exp = candidates.memory_usage(deep=True, index=False).sum() + actual_comp = candidates_comp.memory_usage(deep=True, index=False).sum() assert 0.95 <= estimate_exp / actual_exp <= 1.05, ( "Exp: ", @@ -324,48 +250,6 @@ def test_searchspace_memory_estimate(searchspace: SearchSpace): ) -def test_cardinality_constraints_with_overlapping_parameters(): - """Creating cardinality constraints with overlapping parameters raises an error.""" - parameters = ( - NumericalContinuousParameter("c1", (0, 1)), - NumericalContinuousParameter("c2", (0, 1)), - NumericalContinuousParameter("c3", (0, 1)), - ) - with pytest.raises(ValueError, match="cannot share the same parameters"): - SubspaceContinuous( - parameters=parameters, - constraints_nonlin=( - ContinuousCardinalityConstraint( - parameters=["c1", "c2"], - max_cardinality=1, - ), - ContinuousCardinalityConstraint( - parameters=["c2", "c3"], - max_cardinality=1, - ), - ), - ) - - -def test_cardinality_constraint_with_invalid_parameter_bounds(): - """Imposing a cardinality constraint on a parameter whose range does not include - zero raises an error.""" # noqa - parameters = ( - NumericalContinuousParameter("c1", (0, 1)), - NumericalContinuousParameter("c2", (1, 2)), - ) - with pytest.raises(ValueError, match="must include zero"): - SubspaceContinuous( - parameters=parameters, - constraints_nonlin=( - ContinuousCardinalityConstraint( - parameters=["c1", "c2"], - max_cardinality=1, - ), - ), - ) - - @pytest.mark.skipif( not POLARS_INSTALLED, reason="Optional polars dependency not installed." ) @@ -449,8 +333,9 @@ def test_task_parameter_active_values_validation(): searchspace = SearchSpace.from_dataframe( target_df, parameters=[num_param, task_param, cat_param] ) - assert len(searchspace.discrete.exp_rep) == 1 - assert all(searchspace.discrete.exp_rep["task"] == "target") + candidates = searchspace.discrete.get_candidates() + assert len(candidates) == 1 + assert all(candidates["task"] == "target") @pytest.mark.parametrize("parameter_names", [["Conti_finite1", "Conti_finite2"]]) @@ -516,8 +401,7 @@ def test_sample_from_polytope_mixed_constraints_with_interpoint(): subspace = SubspaceContinuous( parameters=parameters, - constraints_lin_ineq=[regular_constraint], - constraints_lin_eq=[interpoint_constraint], + constraints=[regular_constraint, interpoint_constraint], ) assert subspace.has_interpoint_constraints diff --git a/tests/utils/test_dataframe.py b/tests/utils/test_dataframe.py index 9da4a2c268..916c1895d7 100644 --- a/tests/utils/test_dataframe.py +++ b/tests/utils/test_dataframe.py @@ -110,7 +110,7 @@ def test_degenerate_rows_invalid_input(): ) def test_fuzzy_row_match(searchspace, noise, duplicated): """Fuzzy row matching returns expected indices.""" - left_df = searchspace.discrete.exp_rep.copy() + left_df = searchspace.discrete.get_candidates().copy() selected = np.random.choice(left_df.index, 4, replace=False) right_df = left_df.loc[selected].reset_index(drop=True) @@ -155,7 +155,7 @@ def test_fuzzy_row_match(searchspace, noise, duplicated): @pytest.mark.parametrize("invalid", ["left", "right"]) def test_invalid_fuzzy_row_match(searchspace, invalid): """Returns expected errors when dataframes don't contain all expected columns.""" - left_df = searchspace.discrete.exp_rep.copy() + left_df = searchspace.discrete.get_candidates().copy() selected = np.random.choice(left_df.index, 4, replace=False) right_df = left_df.loc[selected].copy() diff --git a/tests/utils/test_sampling_algorithms.py b/tests/utils/test_sampling_algorithms.py index ffe015721f..1a34a64257 100644 --- a/tests/utils/test_sampling_algorithms.py +++ b/tests/utils/test_sampling_algorithms.py @@ -222,7 +222,9 @@ def test_fps_utility_expected_errors(points, n_requested, initialization, match) def test_fps_recommender_utility_initialization_indices(searchspace): """FPS utilities return expected indices when initialization indices are used.""" - points = searchspace.discrete.comp_rep.values + candidates = searchspace.discrete.get_candidates() + candidates_comp = searchspace.discrete.transform(candidates) + points = candidates_comp.values inds1 = farthest_point_sampling(points, 3, initialization=[0]) inds2 = farthest_point_sampling(points, 3, initialization=[1, 2]) @@ -269,7 +271,9 @@ def test_fps_recommender_result_consistency(searchspace): """FPS utilities return consistent results.""" from baybe._optional.fpsample import fps_sampling - points = searchspace.discrete.comp_rep.values + candidates = searchspace.discrete.get_candidates() + candidates_comp = searchspace.discrete.transform(candidates) + points = candidates_comp.values inds1 = fps_sampling(points, 3, start_idx=0).tolist() inds2 = farthest_point_sampling( points, 3, initialization=[0], random_tie_break=False diff --git a/tests/validation/test_searchspace_validation.py b/tests/validation/test_searchspace_validation.py index c18e56c24e..3327925f27 100644 --- a/tests/validation/test_searchspace_validation.py +++ b/tests/validation/test_searchspace_validation.py @@ -4,7 +4,19 @@ import pytest from pytest import param -from baybe.parameters.numerical import NumericalDiscreteParameter +from baybe.constraints import ( + ContinuousCardinalityConstraint, + ContinuousLinearConstraint, + DiscreteSumConstraint, + ThresholdCondition, +) +from baybe.constraints.discrete import DiscreteLinkedParametersConstraint +from baybe.parameters import ( + CategoricalParameter, + NumericalContinuousParameter, + NumericalDiscreteParameter, +) +from baybe.searchspace import SearchSpace, SubspaceContinuous, SubspaceDiscrete from baybe.utils.dataframe import get_transform_objects parameters = [NumericalDiscreteParameter("d1", [0, 1])] @@ -42,3 +54,174 @@ def test_invalid_transforms(df, match): def test_valid_transforms(df, missing, extra): """When providing the appropriate flags, the columns of the dataframe to be transformed can be flexibly chosen.""" # noqa get_transform_objects(df, parameters, allow_missing=missing, allow_extra=extra) + + +def test_invalid_constraint_parameter_combos(): + """Testing invalid constraint-parameter combinations.""" + parameters = [ + CategoricalParameter("cat1", values=("c1", "c2")), + NumericalDiscreteParameter("d1", values=[1, 2, 3]), + NumericalDiscreteParameter("d2", values=[0, 1, 2]), + NumericalContinuousParameter("c1", (0, 2)), + NumericalContinuousParameter("c2", (-1, 1)), + ] + + # Attempting continuous constraint over hybrid parameter set + with pytest.raises(ValueError): + SearchSpace.from_product( + parameters=parameters, + constraints=[ContinuousLinearConstraint(["c1", "c2", "d1"], "=")], + ) + + # Attempting discrete constraint over hybrid parameter set + with pytest.raises(ValueError): + SearchSpace.from_product( + parameters=parameters, + constraints=[ + DiscreteSumConstraint( + parameters=["d1", "d2", "c1"], + condition=ThresholdCondition(threshold=1.0, operator=">"), + ) + ], + ) + + # Attempting constraints over parameter set where a parameter does not exist + with pytest.raises(ValueError): + SearchSpace.from_product( + parameters=parameters, + constraints=[ + DiscreteSumConstraint( + parameters=["d1", "e7", "c1"], + condition=ThresholdCondition(threshold=1.0, operator=">"), + ) + ], + ) + + # Attempting constraints over parameter set where a parameter does not exist + with pytest.raises(ValueError): + SearchSpace.from_product( + parameters=parameters, + constraints=[ContinuousLinearConstraint(["c1", "e7", "d1"], "=")], + ) + + # Attempting constraints over parameter sets containing non-numerical discrete + # parameters. + with pytest.raises( + ValueError, match="valid only for numerical discrete parameters" + ): + SearchSpace.from_product( + parameters=parameters, + constraints=[ + DiscreteSumConstraint( + parameters=["cat1", "d1", "d2"], + condition=ThresholdCondition(threshold=1.0, operator=">"), + ) + ], + ) + + +def test_cardinality_constraints_with_overlapping_parameters(): + """Creating cardinality constraints with overlapping parameters raises an error.""" + parameters = ( + NumericalContinuousParameter("c1", (0, 1)), + NumericalContinuousParameter("c2", (0, 1)), + NumericalContinuousParameter("c3", (0, 1)), + ) + with pytest.raises(ValueError, match="cannot share the same parameters"): + SubspaceContinuous( + parameters=parameters, + constraints=( + ContinuousCardinalityConstraint( + parameters=["c1", "c2"], + max_cardinality=1, + ), + ContinuousCardinalityConstraint( + parameters=["c2", "c3"], + max_cardinality=1, + ), + ), + ) + + +def test_cardinality_constraint_with_invalid_parameter_bounds(): + """Imposing a cardinality constraint on a parameter whose range does not include + zero raises an error.""" # noqa + parameters = ( + NumericalContinuousParameter("c1", (0, 1)), + NumericalContinuousParameter("c2", (1, 2)), + ) + with pytest.raises(ValueError, match="must include zero"): + SubspaceContinuous( + parameters=parameters, + constraints=( + ContinuousCardinalityConstraint( + parameters=["c1", "c2"], + max_cardinality=1, + ), + ), + ) + + +p_cont = NumericalContinuousParameter("p", (0, 1)) +p_disc = NumericalDiscreteParameter("p", (0, 1)) + + +@pytest.mark.parametrize( + ("p1", "p2", "space"), + [ + param(p_cont, p_cont, SubspaceContinuous, id="continuous"), + param(p_disc, p_disc, SubspaceDiscrete, id="discrete"), + param(p_cont, p_disc, SearchSpace, id="hybrid"), + ], +) +def test_subspace_with_duplicate_parameter_names(p1, p2, space): + """Creating a search space with duplicate parameter names raises an error.""" + with pytest.raises(ValueError, match="unique names"): + space.from_product(parameters=[p1, p2]) + + +@pytest.mark.parametrize("discrete", [True, False]) +@pytest.mark.parametrize( + "referenced", + [ + param(["nonexistent"], id="all_nonexistent"), + param(["p1", "nonexistent"], id="partially_nonexistent"), + ], +) +def test_continuous_subspace_constraint_with_nonexistent_params(referenced, discrete): + """Using constraints referencing nonexistent parameters raises an error.""" + if discrete: + parameters = [ + NumericalDiscreteParameter("p1", (0, 1)), + NumericalDiscreteParameter("p2", (0, 1)), + ] + space = SubspaceDiscrete + constraint = DiscreteLinkedParametersConstraint(referenced) + else: + parameters = [ + NumericalContinuousParameter("p1", (0, 1)), + NumericalContinuousParameter("p2", (0, 1)), + ] + space = SubspaceContinuous + constraint = ContinuousLinearConstraint(referenced, "=") + + with pytest.raises(ValueError, match="does not exist"): + space.from_product(parameters=parameters, constraints=[constraint]) + + +def test_invalid_simplex_creation_with_overlapping_parameters(): + """Creating a simplex searchspace with overlapping simplex and product parameters + raises an error.""" # noqa + parameters = [NumericalDiscreteParameter(name="x_1", values=(0, 1, 2))] + + with pytest.raises( + ValueError, + match="'simplex_parameters' and 'product_parameters' must be disjoint", + ): + SearchSpace( + SubspaceDiscrete.from_simplex( + max_sum=1.0, + simplex_parameters=parameters, + product_parameters=parameters, + ) + ) diff --git a/uv.lock b/uv.lock index 12aabceda0..7edf1aaef4 100644 --- a/uv.lock +++ b/uv.lock @@ -10,7 +10,7 @@ resolution-markers = [ ] [options] -exclude-newer = "2026-05-22T09:27:47.33792Z" +exclude-newer = "2026-06-12T12:57:26.466234Z" exclude-newer-span = "P7D" [[package]] @@ -207,6 +207,7 @@ dependencies = [ { name = "exceptiongroup" }, { name = "gpytorch" }, { name = "joblib" }, + { name = "narwhals" }, { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, { name = "numpy", version = "2.3.5", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, { name = "pandas" }, @@ -389,6 +390,7 @@ simulation = [ ] test = [ { name = "hypothesis", extra = ["pandas"] }, + { name = "polars", extra = ["pyarrow"] }, { name = "pytest" }, { name = "pytest-cov" }, { name = "tenacity" }, @@ -411,12 +413,13 @@ requires-dist = [ { name = "baybe", extras = ["onnx"], marker = "extra == 'benchmarking'" }, { name = "baybe", extras = ["onnx"], marker = "extra == 'extras'" }, { name = "baybe", extras = ["polars"], marker = "extra == 'extras'" }, + { name = "baybe", extras = ["polars"], marker = "extra == 'test'" }, { name = "baybe", extras = ["simulation"], marker = "extra == 'benchmarking'" }, { name = "baybe", extras = ["simulation"], marker = "extra == 'extras'" }, { name = "baybe", extras = ["test"], marker = "extra == 'dev'" }, { name = "boto3", marker = "extra == 'benchmarking'", specifier = ">=1.0.0,<2" }, { name = "botorch", specifier = ">=0.13.0,<1" }, - { name = "cattrs", specifier = ">=25.2.0" }, + { name = "cattrs", specifier = ">=26.1.0" }, { name = "exceptiongroup" }, { name = "flake8", marker = "extra == 'lint'", specifier = "==7.3.0" }, { name = "fpsample", marker = "extra == 'extras'", specifier = ">=1.0.1" }, @@ -430,6 +433,7 @@ requires-dist = [ { name = "matplotlib", marker = "extra == 'examples'", specifier = ">=3.7.3,!=3.9.1" }, { name = "mypy", marker = "extra == 'mypy'", specifier = ">=1.19.1" }, { name = "myst-parser", marker = "extra == 'docs'", specifier = ">=4.0.0" }, + { name = "narwhals", specifier = ">=2,<3" }, { name = "ngboost", marker = "extra == 'extras'", specifier = ">=0.3.12,<1" }, { name = "numpy", specifier = ">=1.24.1,<3" }, { name = "onnx", marker = "extra == 'onnx'", specifier = ">=1.16.0" }, @@ -605,16 +609,16 @@ wheels = [ [[package]] name = "cattrs" -version = "25.3.0" +version = "26.1.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "attrs" }, { name = "exceptiongroup", marker = "python_full_version < '3.11'" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/6e/00/2432bb2d445b39b5407f0a90e01b9a271475eea7caf913d7a86bcb956385/cattrs-25.3.0.tar.gz", hash = "sha256:1ac88d9e5eda10436c4517e390a4142d88638fe682c436c93db7ce4a277b884a", size = 509321, upload-time = "2025-10-07T12:26:08.737Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a0/ec/ba18945e7d6e55a58364d9fb2e46049c1c2998b3d805f19b703f14e81057/cattrs-26.1.0.tar.gz", hash = "sha256:fa239e0f0ec0715ba34852ce813986dfed1e12117e209b816ab87401271cdd40", size = 495672, upload-time = "2026-02-18T22:15:19.406Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/d8/2b/a40e1488fdfa02d3f9a653a61a5935ea08b3c2225ee818db6a76c7ba9695/cattrs-25.3.0-py3-none-any.whl", hash = "sha256:9896e84e0a5bf723bc7b4b68f4481785367ce07a8a02e7e9ee6eb2819bc306ff", size = 70738, upload-time = "2025-10-07T12:26:06.603Z" }, + { url = "https://files.pythonhosted.org/packages/80/56/60547f7801b97c67e97491dc3d9ade9fbccbd0325058fd3dfcb2f5d98d90/cattrs-26.1.0-py3-none-any.whl", hash = "sha256:d1e0804c42639494d469d08d4f26d6b9de9b8ab26b446db7b5f8c2e97f7c3096", size = 73054, upload-time = "2026-02-18T22:15:17.958Z" }, ] [[package]] @@ -3085,11 +3089,11 @@ wheels = [ [[package]] name = "narwhals" -version = "2.15.0" +version = "2.20.0" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/47/6d/b57c64e5038a8cf071bce391bb11551657a74558877ac961e7fa905ece27/narwhals-2.15.0.tar.gz", hash = "sha256:a9585975b99d95084268445a1fdd881311fa26ef1caa18020d959d5b2ff9a965", size = 603479, upload-time = "2026-01-06T08:10:13.27Z" } +sdist = { url = "https://files.pythonhosted.org/packages/e9/f3/257adc69a71011b4c8cda321b00f02c5bf1980ae38ffd05a58d9632d4de8/narwhals-2.20.0.tar.gz", hash = "sha256:c10994975fa7dc5a68c2cffcddbd5908fc8ebb2d463c5bab085309c0ee1f551e", size = 627848, upload-time = "2026-04-20T12:11:45.427Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3d/2e/cf2ffeb386ac3763526151163ad7da9f1b586aac96d2b4f7de1eaebf0c61/narwhals-2.15.0-py3-none-any.whl", hash = "sha256:cbfe21ca19d260d9fd67f995ec75c44592d1f106933b03ddd375df7ac841f9d6", size = 432856, upload-time = "2026-01-06T08:10:11.511Z" }, + { url = "https://files.pythonhosted.org/packages/d0/69/f24d3d1c38ad69e256138b4ec2452a8c7cf66be49dc214771ae99dd4f0a0/narwhals-2.20.0-py3-none-any.whl", hash = "sha256:16e750ea5507d4ba6e8d03455b5f93a535e0405976561baea235bca5dc9f475d", size = 449373, upload-time = "2026-04-20T12:11:43.596Z" }, ] [[package]]