Candidates Engine by AdrianSosic · Pull Request #837 · emdgroup/baybe

AdrianSosic · 2026-06-23T09:16:40Z

Fixes #793

PRs merged

Upcoming PRs

Refactor candidates interface #840
wiring CandidatesProtocol
get_candidates lazification
DiscreteParameter lazification

To be decided

Should the is_constrained property of the search space classes be dropped? Reasons:
- SubspaceContinuous.is_constrained used to contain non-trivial logic but with the new design it's equivalent to a simple if self.constraints call
- SubspaceContinuous.is_constrained was very much meaningless since the constraints attribute itself was flawed (Deprecate SubspaceDiscrete.constraints #835). And with both the legacy and new design, it's not obvious what it actually means!
- Having an is_constrained attribute only on some of the search space classes is asymmetric
SubspaceDiscrete.batch_constraints vs DiscreteBatchConstraint name conflict. Options:
- rename batch_constraints to something else, e.g. recommendation_constraints. However, batch_constraints is already a generic term. Perhaps we should rather ...
- rename DiscreteBatchConstraint to something more specific. The latter is actually rather a name for an abstract base class if we decide to add more batch-level constraints, and does not convey anything about what it does. Options would be in the direction of DiscreteSharedValueConstraint
Can we fully kick n_batches_done and n_fits_done?
How to cleanly separate "filtering" and "batch constraints"? The eval_during_creation and eval_during_modeling are currently not mutually exclusive, so the semantics are not 100% clear, hence the assert statements in searchspace/discrete.py <-- clean up during constraint refactoring?
Scaling approach: shall we use complete unfiltered space, filtered space or policy-generated subset? Decision depends heavily on what is cheap and possible in a large lazy space but also what makes conceptually sense (i.e. should adding a hypothetical candidate that is removed through a filter, e.g. by policy or active values, change the induced scaling?). Also impacts methods like comp_rep_bounds, which now behave differently compared to main (Refactor candidates interface #840 (comment)). At the same time, the discrete version of comp_rep_bounds may be dropped entirely since effectively unused at the moment (it's called in optimize_acqf_mixed but the discrete bounds should actually not matter since the candidates are explicitly piped in as separate argument --> needs investigation)

TODOs

Freeze SubspaceDiscrete and turn comp_rep into cached_property

Follow-up TODOs (after dev completion)

Streamline recommendation call chain (i.e. avoid jumping back and forth between classes)

Expecting a grouped constaint input from the user is unnecessary since we can also take care of the grouping internally.

Using version from beginnign of 2026

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

*This is a pure internal refactor — no behavior change.* ### Background The consistency of the attribute structure of `SubspaceDiscrete` was historically enforced only by convention, not by the type system, meaning that mutating a subspace instance could produce an inconsistent state. Creating filtered copies of a subspace was therefore not safe. On top of that, doing so could have been potentially costly, since it might have involved materializing (parts of) the Cartesian product — another reason to avoid it. As a workaround, the codebase avoided modifying subspace objects for filtered candidate sets altogether and instead computed candidates externally, threading them as a separate `candidates_exp` dataframe argument down through multiple layers of the recommender call stack. The `FilteredSubspaceDiscrete` subclass existed for the same reason: to apply a Boolean mask at call time without touching the subspace itself. Previous preparatory PRs resolved issue [#794](#794) by making the consistency of the attribute structure of `SubspaceDiscrete` an enforced invariant, which now enables safe creation of filtered subspace copies. The cost problem — to be avoided by lazy materialization of the Cartesian product (see [#796](#796)) — has not yet been addressed but will be resolved in a subsequent PR on the same development branch. This PR capitalizes on the invariant enforcement to simplify the recommender internals. ### Changes **Drop `FilteredSubspaceDiscrete`** (`baybe/searchspace/_filtered.py` deleted) The subclass existed solely to override `get_candidates()` with a Boolean mask applied at call time. Now that creating a filtered copy of a subspace is safe, the class is unnecessary. `Campaign` is updated to create filtered subspace copies directly. **Drop dead `pending_experiments` argument** The `pending_experiments` parameter had been used in `PureRecommender._recommend_with_discrete_parts` to filter pending experiments from the candidate set, but became dead code when that filtering responsibility was moved to `Campaign` in an earlier refactor. It is now removed. **Drop redundant `candidates_exp` argument** The filtered set of discrete candidates was fetched once and then passed as a parallel `candidates_exp` dataframe through several layers of the recommender stack (`_recommend` → `_recommend_discrete/hybrid` → BoTorch-specific helpers). With safe subspace mutation now available, subset-constrained paths are updated to create filtered copies of the subspace instead of propagating a separate dataframe. The argument is removed from all affected signatures across: - `PureRecommender._recommend_discrete` / `_recommend_hybrid` - `BotorchRecommender._recommend_discrete` / `_recommend_hybrid` - `recommend_discrete_with_subsets` / `recommend_discrete_without_subsets` - `recommend_hybrid_with_subsets` / `recommend_hybrid_without_subsets` - `SKLearnClusteringRecommender._recommend_discrete` - `FPSRecommender._recommend_discrete` - `RandomRecommender._recommend_hybrid` - `NaiveHybridSpaceRecommender`

Now returns only the experimental representation to avoid wasteful computation of the computational representation when not needed.

With the upcoming changes, subsequent calls may yield different results

Preparation to fix #795, #796 and #798. ### Background As part of the broader effort toward lazy, on-demand evaluation of candidate sets (see linked issues), this PR removes the dependency on the eagerly pre-computed, fully materialized `exp_rep` and `comp_rep` public attributes of `SubspaceDiscrete`. The old design forced the full candidate space to be computed and cached upfront, even when only a subset is needed — blocking future subsampling policies and backend-agnostic mechanisms required for handling large spaces. The key step is giving `get_candidates` a clean, single-purpose method signature as the sole access point for the experimental representation, so that transformation into the computational representation happens only when explicitly requested and only on the relevant data (the subset selection will be enabled later). ### Out of scope The return type of `get_candidates` is expected to be elevated to a higher-level object in a follow-up (e.g. `TableCandidates` or a similar abstraction). This PR deliberately keeps the return type as `pd.DataFrame` to stay focused on the interface decoupling. ### Changes **Make `exp_rep` private** (`baybe/searchspace/discrete.py`) Renames the field to `_exp_rep` (with `alias="exp_rep"` for serialization compatibility), updates the validator reference, and replaces all internal accesses. **Simplify `get_candidates` signature** (`baybe/searchspace/discrete.py`) Returns only the experimental representation (`pd.DataFrame`) instead of a `tuple[pd.DataFrame, pd.DataFrame]`, avoiding wasteful upfront computation of the computational representation. **Update all internal call sites** (`baybe/campaign.py`, `baybe/recommenders/`, `baybe/simulation/`, `baybe/searchspace/core.py`, `baybe/acquisition/`) Replaces tuple-unpacking calls to `get_candidates` and direct `exp_rep`/`comp_rep` accesses with the new API throughout. Computational representation is now computed on-demand, at the point of use. **Update examples and tests** (`examples/`, `tests/`) Adapts all affected examples and tests to the new `get_candidates` return type. **Deprecate `exp_rep` and `comp_rep` properties** (`baybe/searchspace/discrete.py`) Adds `DeprecationWarning` shims for `exp_rep` (pointing to `get_candidates()`) and `comp_rep` (pointing to `transform(get_candidates())`), with corresponding deprecation tests.

AdrianSosic and others added 30 commits April 7, 2026 09:10

Drop SubspaceDiscrete.empty_encoding attribute

030a78f

Drop comp_rep parameter from SubspaceDiscrete.__init__

2f41bec

Fix SubspaceDiscrete class and attribute docstrings

985a519

Add missing validator to SubspaceDiscrete.parameters

8cd624f

Clean up SubspaceContinuous attributes

b99c0b1

Expecting a grouped constaint input from the user is unnecessary since we can also take care of the grouping internally.

Fix SubspaceContinuous class and attribute docstrings

6882273

Add narwhal lazyframe converter

eb4545a

Add polars(pyarrow) and narwhals as hard dependencies

507a38b

Using version from beginnign of 2026

Add InfiniteSpaceError

f499cbb

Add CandidateProtocol as well as TabelCandidates and ProductCandidates

d920ae2

Add tests for ProductCandidates and TableCandidates

2c724e2

Add parameter name check to Candidates

22aa41b

Update CHANGELOG.md

92af0ca

Fix typo

fc19bc4

Remove attribute validation from CandidateProtocol

4a43d8b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Fix terminology: enumerable -> finite

eacd434

Remove polars as hard dependency

6cb0e34

Adjust narwhals version constraints

8be8647

Use narwhals stable.v2 namespace

2eacc75

Turn protocol (class) attribute into property

053c16a

Refine attrs coding conventions in AGENTS.md

05b3ad3

Add missing garbage collection step

831aafe

Fix attribute definitions

a4abf8f

Drop unnecessary __attrs_post_init__

dda34ff

Rework candidate module docstrings

adabdd4

Turn delayed validation into eager validation

2a7f4b9

Add DiscreteParameter.is_finite

c5ebbba

Drop unused helper function

4605151

Adjust lazyframe conversion utility

0a4b9b8

Rename CandidatesProtocol.to_lazy_candidates to to_lazy

10f8118

AdrianSosic added 4 commits June 23, 2026 22:14

Drop FilteredSubspaceDiscrete class

66a3e7c

Drop dead pending_experiments argument

e3ef5b0

Drop redundant candidates_exp argument

075f186

Update CHANGELOG.md

b2b8ee2

This was referenced Jun 25, 2026

SubspaceDiscrete Refactor — Problem Statement #793

Open

Refactor recommendation mechanics #838

Merged

AdrianSosic and others added 22 commits June 25, 2026 09:25

Create shallow copy to avoid in-place dataframe mutation

2600746

Drop redundant candidates_exp argument from subset methods

e9b1915

Fix mypy error

1399e6b

Drop leftover comp_rep argument from SubspaceDiscrete call

d7e50ec

Turn exp_rep attribute private

fcd05b1

Adjust get_candidates signature

1d04b91

Now returns only the experimental representation to avoid wasteful computation of the computational representation when not needed.

Remove exp_rep/comp_rep from __str__ methods

b225071

Replace exp_rep accesses in Campaign class

0d75b88

Replace exp_rep accesses in recommenders

632cbb3

Replace exp_rep accesses in simulation package

7f4dcfd

Replace exp_rep/comp_rep accesses in tests

ffb02e9

Replace exp_rep/comp_rep accesses in examples

6a109ba

Replace remaining external comp_rep accesses

5a0d341

Replace remaining internal comp_rep accesses

5212fb3

Deprecate exp_rep and comp_rep access

e88491c

Update CHANGELOG.md

559c40c

Fix mypy

e096cb8

Update get_candidates call structure in examples

071ae66

Replace leftover exp_rep accesses

51f03da

Avoid duplicate get_candidates calls

59d8869

With the upcoming changes, subsequent calls may yield different results

Fix bounds computation for empty parameter set

fdbe7ba

AdrianSosic mentioned this pull request Jun 26, 2026

Refactor candidates interface #840

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Candidates Engine#837

Candidates Engine#837
AdrianSosic wants to merge 144 commits into
mainfrom
dev/candidates

AdrianSosic commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AdrianSosic commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PRs merged

Upcoming PRs

To be decided

TODOs

Follow-up TODOs (after dev completion)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AdrianSosic commented Jun 23, 2026 •

edited

Loading