Candidates Engine#837
Draft
AdrianSosic wants to merge 144 commits into
Draft
Conversation
Expecting a grouped constaint input from the user is unnecessary since we can also take care of the grouping internally.
Using version from beginnign of 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was referenced Jun 25, 2026
*This is a pure internal refactor — no behavior change.* ### Background The consistency of the attribute structure of `SubspaceDiscrete` was historically enforced only by convention, not by the type system, meaning that mutating a subspace instance could produce an inconsistent state. Creating filtered copies of a subspace was therefore not safe. On top of that, doing so could have been potentially costly, since it might have involved materializing (parts of) the Cartesian product — another reason to avoid it. As a workaround, the codebase avoided modifying subspace objects for filtered candidate sets altogether and instead computed candidates externally, threading them as a separate `candidates_exp` dataframe argument down through multiple layers of the recommender call stack. The `FilteredSubspaceDiscrete` subclass existed for the same reason: to apply a Boolean mask at call time without touching the subspace itself. Previous preparatory PRs resolved issue [#794](#794) by making the consistency of the attribute structure of `SubspaceDiscrete` an enforced invariant, which now enables safe creation of filtered subspace copies. The cost problem — to be avoided by lazy materialization of the Cartesian product (see [#796](#796)) — has not yet been addressed but will be resolved in a subsequent PR on the same development branch. This PR capitalizes on the invariant enforcement to simplify the recommender internals. ### Changes **Drop `FilteredSubspaceDiscrete`** (`baybe/searchspace/_filtered.py` deleted) The subclass existed solely to override `get_candidates()` with a Boolean mask applied at call time. Now that creating a filtered copy of a subspace is safe, the class is unnecessary. `Campaign` is updated to create filtered subspace copies directly. **Drop dead `pending_experiments` argument** The `pending_experiments` parameter had been used in `PureRecommender._recommend_with_discrete_parts` to filter pending experiments from the candidate set, but became dead code when that filtering responsibility was moved to `Campaign` in an earlier refactor. It is now removed. **Drop redundant `candidates_exp` argument** The filtered set of discrete candidates was fetched once and then passed as a parallel `candidates_exp` dataframe through several layers of the recommender stack (`_recommend` → `_recommend_discrete/hybrid` → BoTorch-specific helpers). With safe subspace mutation now available, subset-constrained paths are updated to create filtered copies of the subspace instead of propagating a separate dataframe. The argument is removed from all affected signatures across: - `PureRecommender._recommend_discrete` / `_recommend_hybrid` - `BotorchRecommender._recommend_discrete` / `_recommend_hybrid` - `recommend_discrete_with_subsets` / `recommend_discrete_without_subsets` - `recommend_hybrid_with_subsets` / `recommend_hybrid_without_subsets` - `SKLearnClusteringRecommender._recommend_discrete` - `FPSRecommender._recommend_discrete` - `RandomRecommender._recommend_hybrid` - `NaiveHybridSpaceRecommender`
Now returns only the experimental representation to avoid wasteful computation of the computational representation when not needed.
With the upcoming changes, subsequent calls may yield different results
Preparation to fix #795, #796 and #798. ### Background As part of the broader effort toward lazy, on-demand evaluation of candidate sets (see linked issues), this PR removes the dependency on the eagerly pre-computed, fully materialized `exp_rep` and `comp_rep` public attributes of `SubspaceDiscrete`. The old design forced the full candidate space to be computed and cached upfront, even when only a subset is needed — blocking future subsampling policies and backend-agnostic mechanisms required for handling large spaces. The key step is giving `get_candidates` a clean, single-purpose method signature as the sole access point for the experimental representation, so that transformation into the computational representation happens only when explicitly requested and only on the relevant data (the subset selection will be enabled later). ### Out of scope The return type of `get_candidates` is expected to be elevated to a higher-level object in a follow-up (e.g. `TableCandidates` or a similar abstraction). This PR deliberately keeps the return type as `pd.DataFrame` to stay focused on the interface decoupling. ### Changes **Make `exp_rep` private** (`baybe/searchspace/discrete.py`) Renames the field to `_exp_rep` (with `alias="exp_rep"` for serialization compatibility), updates the validator reference, and replaces all internal accesses. **Simplify `get_candidates` signature** (`baybe/searchspace/discrete.py`) Returns only the experimental representation (`pd.DataFrame`) instead of a `tuple[pd.DataFrame, pd.DataFrame]`, avoiding wasteful upfront computation of the computational representation. **Update all internal call sites** (`baybe/campaign.py`, `baybe/recommenders/`, `baybe/simulation/`, `baybe/searchspace/core.py`, `baybe/acquisition/`) Replaces tuple-unpacking calls to `get_candidates` and direct `exp_rep`/`comp_rep` accesses with the new API throughout. Computational representation is now computed on-demand, at the point of use. **Update examples and tests** (`examples/`, `tests/`) Adapts all affected examples and tests to the new `get_candidates` return type. **Deprecate `exp_rep` and `comp_rep` properties** (`baybe/searchspace/discrete.py`) Adds `DeprecationWarning` shims for `exp_rep` (pointing to `get_candidates()`) and `comp_rep` (pointing to `transform(get_candidates())`), with corresponding deprecation tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #793
PRs merged
SubspaceDiscrete.constraints#835Upcoming PRs
CandidatesProtocolget_candidateslazificationDiscreteParameterlazificationTo be decided
is_constrainedproperty of the search space classes be dropped? Reasons:SubspaceContinuous.is_constrainedused to contain non-trivial logic but with the new design it's equivalent to a simpleif self.constraintscallSubspaceContinuous.is_constrainedwas very much meaningless since theconstraintsattribute itself was flawed (DeprecateSubspaceDiscrete.constraints#835). And with both the legacy and new design, it's not obvious what it actually means!is_constrainedattribute only on some of the search space classes is asymmetricSubspaceDiscrete.batch_constraintsvsDiscreteBatchConstraintname conflict. Options:batch_constraintsto something else, e.g.recommendation_constraints. However,batch_constraintsis already a generic term. Perhaps we should rather ...DiscreteBatchConstraintto something more specific. The latter is actually rather a name for an abstract base class if we decide to add more batch-level constraints, and does not convey anything about what it does. Options would be in the direction ofDiscreteSharedValueConstraintn_batches_doneandn_fits_done?eval_during_creationandeval_during_modelingare currently not mutually exclusive, so the semantics are not 100% clear, hence theassertstatements insearchspace/discrete.py<-- clean up during constraint refactoring?complete unfiltered space,filtered spaceorpolicy-generated subset? Decision depends heavily on what is cheap and possible in a large lazy space but also what makes conceptually sense (i.e. should adding a hypothetical candidate that is removed through a filter, e.g. by policy or active values, change the induced scaling?). Also impacts methods likecomp_rep_bounds, which now behave differently compared tomain(Refactor candidates interface #840 (comment)). At the same time, the discrete version ofcomp_rep_boundsmay be dropped entirely since effectively unused at the moment (it's called inoptimize_acqf_mixedbut the discrete bounds should actually not matter since the candidates are explicitly piped in as separate argument --> needs investigation)TODOs
SubspaceDiscreteand turncomp_repintocached_propertyFollow-up TODOs (after dev completion)