adapter: Further preparations for handling query plans in catalog implications#35837
adapter: Further preparations for handling query plans in catalog implications#35837ggevay merged 2 commits intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
1580288 to
1057460
Compare
This is preparation for moving those catalog operations into the catalog implications framework that involve query plans, such as `CREATE MATERIALIZED VIEW`. This PR solves a long-standing weirdness in our catalog: We used to have query plans and other metainfo in a separate part of the catalog (`CatalogPlans`), instead of inside the catalog items. This happened only due to historical reasons. This PR fixes this. I recommend reviewing commit by commit. The 1st is just some renaming to avoid confusion between locally optimized plans (with just `optimize_mir_local` without view inlining) and fully (globally) optimized plans (that is, after view inlining, with `optimize_dataflow`). The 2nd commit is the main thing, see commit msg for details. (Note that even after this PR, we are still adding the plans in the side effect closure of the catalog transactions that adds the catalog items, e.g., in `create_materialized_view_finish`. #35837 will be a follow-up PR, which will instead add the plans to the `catalog::Op`s that comprise the catalog transactions.) Nightly: https://buildkite.com/materialize/nightly/builds/15973, but there is a staggering amount of unrelated redness, so it's hard to read. --------- Co-authored-by: Junie <junie@jetbrains.com>
cff16bc to
f45a28c
Compare
96637fc to
5326805
Compare
mtabebe
left a comment
There was a problem hiding this comment.
Great test coverage. And I think that I have reviewed some of these commits before. So I mainly focused on the last one. It looks good. Just one minor question
| ) | ||
| }; | ||
|
|
||
| // Populate the durable expression cache before the catalog |
There was a problem hiding this comment.
What happens if the transaction fails? My assumption is that it ok to cache the expression of failed transactions... just confirming
There was a problem hiding this comment.
Similarly, how do we handle notices if the transaction doesn't succeed? Will this state just remain around forever?
There was a problem hiding this comment.
Good questions!
Expression cache on failed transaction: Yes, harmless. The cache entry is keyed by (build_version, GlobalId, expr_type). If the transaction fails, that GlobalId never lands in the catalog, so the entry is never looked up — get_global_expressions (in the follow-up PR) only queries GlobalIds from Op::CreateItem ops that are actually being committed. The orphaned entry sits inert in the persist shard until the next version bump cleans it up via remove_prior_versions.
Notices on failed transaction: This is actually the exact bug this PR fixes! Pre-fix, emit_optimizer_notices ran before the catalog transaction, so the user saw spurious notices (e.g. "identical index already exists") even when IF NOT EXISTS hit a name collision and the item wasn't created. Post-fix:
- Raw notices to the user (
emit_raw_optimizer_notices_to_user) are emitted only in theOk(_)arm after the transaction succeeds. - Rendered notices persisted to
mz_optimizer_notices(viapersist_dataflow_metainfo) only happen inside the side-effect closure, which only runs on success. - Rendered notices in the expression cache are only consumed by
parse_itemduringapply_updatesfor committed transactions, so also inert on failure.
The new notice.pt test at the bottom of the PR pins this invariant for both CREATE INDEX IF NOT EXISTS and CREATE MATERIALIZED VIEW IF NOT EXISTS.
There was a problem hiding this comment.
Ah lol, Claude just went ahead and posted this when I asked it to draft a reply :D
There was a problem hiding this comment.
But yeah, its answer is mostly right: The important thing is that this is only a small amount of orphaned data, and it gets cleaned up on the next version upgrade, because the expression cache is scoped to a version.
Moves `render_notices` out of the side-effect closure of `catalog_transact_with_side_effects` in `create_index_finish` and `create_materialized_view_finish`. Renders notices pre-transaction using an `ExprHumanizerExt` wrapping `for_system_session()` that knows about the to-be-created item's `global_id` and full name, matching the pattern already used by the `_explain` paths. Splits the old `process_dataflow_metainfo` into two narrower helpers: - `emit_raw_optimizer_notices_to_user`: forwards raw notices to the user's session (still session-aware humanizer). - `persist_dataflow_metainfo`: takes already-rendered notices, packs `mz_optimizer_notices` builtin-table updates, and stores the metainfo on the catalog object. Extracts the humanizer-agnostic rendering core into `CatalogState::render_notices_core(&dyn ExprHumanizer, EpochMillis, ...)`. `Catalog::render_notices` remains as a thin adapter for existing callers (bootstrap path, CT). Unblocks a follow-up change that moves `cache_expressions` before the transaction so that the durable expression cache ends up with rendered notices. Co-authored-by: Junie <junie@jetbrains.com>
5326805 to
7127b5f
Compare
`Catalog::cache_expressions` used to: - read plans + metainfo from in-memory catalog state (after they had been installed by `set_optimized_plan` / `set_physical_plan` / `set_dataflow_metainfo`), and - drop the future returned by `update_expression_cache` (`let _fut = ...`), so the cache write raced with the catalog transaction commit. Both properties block the long-term goal of having the durable expression cache be a valid source of truth for plans and rendered notices across envd processes: the write needs to be observed before the item becomes visible, and the function must be callable before the catalog state is mutated. This commit: - Changes `Catalog::cache_expressions` to take the plans and metainfo directly as parameters, and to return the future from `update_expression_cache` rather than dropping it. - Moves the call ahead of `catalog_transact_with_side_effects` in `create_index_finish` and `create_materialized_view_finish` and `.await`s it there. Together with the preceding commit (rendered notices pre-transaction), the durable expression cache now contains rendered notices and is committed before the catalog transaction commits, unblocking a later change that reads plans back from the cache in `parse_item`.
7127b5f to
f3c147f
Compare
This is further preparation for handling those DDL commands in the catalog implications that need query plans. The 2 commits move things from the catalog transaction's side effect closure to before the catalog transaction. See more details in the commit messages.
The important thing is that this unlocks a later change that reads plans back from the cache in
parse_item. (Draft PR: #36146)Nightly: https://buildkite.com/materialize/nightly/builds/16131 (slightly stale)