Skip to content

Candidates Engine#837

Draft
AdrianSosic wants to merge 144 commits into
mainfrom
dev/candidates
Draft

Candidates Engine#837
AdrianSosic wants to merge 144 commits into
mainfrom
dev/candidates

Conversation

@AdrianSosic

@AdrianSosic AdrianSosic commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Fixes #793

PRs merged

Upcoming PRs

To be decided

  • Should the is_constrained property of the search space classes be dropped? Reasons:
    • SubspaceContinuous.is_constrained used to contain non-trivial logic but with the new design it's equivalent to a simple if self.constraints call
    • SubspaceContinuous.is_constrained was very much meaningless since the constraints attribute itself was flawed (Deprecate SubspaceDiscrete.constraints #835). And with both the legacy and new design, it's not obvious what it actually means!
    • Having an is_constrained attribute only on some of the search space classes is asymmetric
  • SubspaceDiscrete.batch_constraints vs DiscreteBatchConstraint name conflict. Options:
    • rename batch_constraints to something else, e.g. recommendation_constraints. However, batch_constraints is already a generic term. Perhaps we should rather ...
    • rename DiscreteBatchConstraint to something more specific. The latter is actually rather a name for an abstract base class if we decide to add more batch-level constraints, and does not convey anything about what it does. Options would be in the direction of DiscreteSharedValueConstraint
  • Can we fully kick n_batches_done and n_fits_done?
  • How to cleanly separate "filtering" and "batch constraints"? The eval_during_creation and eval_during_modeling are currently not mutually exclusive, so the semantics are not 100% clear, hence the assert statements in searchspace/discrete.py <-- clean up during constraint refactoring?
  • Scaling approach: shall we use complete unfiltered space, filtered space or policy-generated subset? Decision depends heavily on what is cheap and possible in a large lazy space but also what makes conceptually sense (i.e. should adding a hypothetical candidate that is removed through a filter, e.g. by policy or active values, change the induced scaling?). Also impacts methods like comp_rep_bounds, which now behave differently compared to main (Refactor candidates interface #840 (comment)). At the same time, the discrete version of comp_rep_bounds may be dropped entirely since effectively unused at the moment (it's called in optimize_acqf_mixed but the discrete bounds should actually not matter since the candidates are explicitly piped in as separate argument --> needs investigation)

TODOs

  • Freeze SubspaceDiscrete and turn comp_rep into cached_property

Follow-up TODOs (after dev completion)

  • Streamline recommendation call chain (i.e. avoid jumping back and forth between classes)

AdrianSosic and others added 30 commits April 7, 2026 09:10
Expecting a grouped constaint input from the user is unnecessary since
we can also take care of the grouping internally.
Using version from beginnign of 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
AdrianSosic and others added 22 commits June 25, 2026 09:25
*This is a pure internal refactor — no behavior change.*

### Background

The consistency of the attribute structure of `SubspaceDiscrete` was
historically enforced only by convention, not by the type system,
meaning that mutating a subspace instance could produce an inconsistent
state. Creating filtered copies of a subspace was therefore not safe. On
top of that, doing so could have been potentially costly, since it might
have involved materializing (parts of) the Cartesian product — another
reason to avoid it.

As a workaround, the codebase avoided modifying subspace objects for
filtered candidate sets altogether and instead computed candidates
externally, threading them as a separate `candidates_exp` dataframe
argument down through multiple layers of the recommender call stack. The
`FilteredSubspaceDiscrete` subclass existed for the same reason: to
apply a Boolean mask at call time without touching the subspace itself.

Previous preparatory PRs resolved issue
[#794](#794) by making the
consistency of the attribute structure of `SubspaceDiscrete` an enforced
invariant, which now enables safe creation of filtered subspace copies.
The cost problem — to be avoided by lazy materialization of the
Cartesian product (see
[#796](#796)) — has not yet been
addressed but will be resolved in a subsequent PR on the same
development branch. This PR capitalizes on the invariant enforcement to
simplify the recommender internals.

### Changes

**Drop `FilteredSubspaceDiscrete`** (`baybe/searchspace/_filtered.py`
deleted)
The subclass existed solely to override `get_candidates()` with a
Boolean mask applied at call time. Now that creating a filtered copy of
a subspace is safe, the class is unnecessary. `Campaign` is updated to
create filtered subspace copies directly.

**Drop dead `pending_experiments` argument**
The `pending_experiments` parameter had been used in
`PureRecommender._recommend_with_discrete_parts` to filter pending
experiments from the candidate set, but became dead code when that
filtering responsibility was moved to `Campaign` in an earlier refactor.
It is now removed.

**Drop redundant `candidates_exp` argument**
The filtered set of discrete candidates was fetched once and then passed
as a parallel `candidates_exp` dataframe through several layers of the
recommender stack (`_recommend` → `_recommend_discrete/hybrid` →
BoTorch-specific helpers). With safe subspace mutation now available,
subset-constrained paths are updated to create filtered copies of the
subspace instead of propagating a separate dataframe. The argument is
removed from all affected signatures across:

- `PureRecommender._recommend_discrete` / `_recommend_hybrid`
- `BotorchRecommender._recommend_discrete` / `_recommend_hybrid`
- `recommend_discrete_with_subsets` /
`recommend_discrete_without_subsets`
- `recommend_hybrid_with_subsets` / `recommend_hybrid_without_subsets`
- `SKLearnClusteringRecommender._recommend_discrete`
- `FPSRecommender._recommend_discrete`
- `RandomRecommender._recommend_hybrid`
- `NaiveHybridSpaceRecommender`
Now returns only the experimental representation to avoid
wasteful computation of the computational representation when
not needed.
With the upcoming changes, subsequent calls may yield different results
Preparation to fix #795, #796 and #798.

### Background

As part of the broader effort toward lazy, on-demand evaluation of
candidate sets (see linked issues), this PR removes the dependency on
the eagerly pre-computed, fully materialized `exp_rep` and `comp_rep`
public attributes of `SubspaceDiscrete`. The old design forced the full
candidate space to be computed and cached upfront, even when only a
subset is needed — blocking future subsampling policies and
backend-agnostic mechanisms required for handling large spaces.

The key step is giving `get_candidates` a clean, single-purpose method
signature as the sole access point for the experimental representation,
so that transformation into the computational representation happens
only when explicitly requested and only on the relevant data (the subset
selection will be enabled later).

### Out of scope

The return type of `get_candidates` is expected to be elevated to a
higher-level object in a follow-up (e.g. `TableCandidates` or a similar
abstraction). This PR deliberately keeps the return type as
`pd.DataFrame` to stay focused on the interface decoupling.

### Changes

**Make `exp_rep` private** (`baybe/searchspace/discrete.py`)
Renames the field to `_exp_rep` (with `alias="exp_rep"` for
serialization compatibility), updates the validator reference, and
replaces all internal accesses.

**Simplify `get_candidates` signature**
(`baybe/searchspace/discrete.py`)
Returns only the experimental representation (`pd.DataFrame`) instead of
a `tuple[pd.DataFrame, pd.DataFrame]`, avoiding wasteful upfront
computation of the computational representation.

**Update all internal call sites** (`baybe/campaign.py`,
`baybe/recommenders/`, `baybe/simulation/`, `baybe/searchspace/core.py`,
`baybe/acquisition/`)
Replaces tuple-unpacking calls to `get_candidates` and direct
`exp_rep`/`comp_rep` accesses with the new API throughout. Computational
representation is now computed on-demand, at the point of use.

**Update examples and tests** (`examples/`, `tests/`)
Adapts all affected examples and tests to the new `get_candidates`
return type.

**Deprecate `exp_rep` and `comp_rep` properties**
(`baybe/searchspace/discrete.py`)
Adds `DeprecationWarning` shims for `exp_rep` (pointing to
`get_candidates()`) and `comp_rep` (pointing to
`transform(get_candidates())`), with corresponding deprecation tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SubspaceDiscrete Refactor — Problem Statement

3 participants