Performance improvements#1173
Open
giltho wants to merge 3 commits into
Open
Conversation
Signed-off-by: Sacha Ayoun <sachaayoun@gmail.com>
Building `FullDefKind::Const` and `FullDefKind::AssocConst` was eagerly running `const_value()` -> `eval_ty_constant()` -> rustc's const interpreter for every const item. Profiling on SparsePostQuantumRatchet showed the field's only consumer is `translate_def_body_inner`, and only on the MIR-missing fallback path. Make it lazy: store `None`, and let the body translator call `const_value` on demand. Charon was duplicating eval work rustc's own `check_crate` does anyway, so the net wall-clock win is modest (~50ms on SpQR) but the change is strictly cleaner and removes one `sinto` per const item.
The dominant cost in charon on real crates (~60% of wall time on SparsePostQuantumRatchet) is rustc's `check_crate` doing typeck and const-eval, which `par_hir_body_owners` already farms out across threads when `-Z threads` is > 1. Default `threads` to half the available cores, capped at 8 (returns plateau there in our measurements). Effect on SparsePostQuantumRatchet (preset=fast, postcard output) on a 16-core M-series chip: 7.08s -> 5.61s (~20% faster). We only set this when we're actually doing translation, so dependency crates compiled through our `RUSTC_WRAPPER` shim still use the cargo default.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I spent some tokens on trying to get performance improvements on Charon. Claude found 3 optimisations, in order of effectiveness:
I'm not 100% sure the parallelism by default is desirable, worst case it can probably be set by Soteria itself.
Here's a list of all the changes made and their impact on the benchmark:
preset=fast, postcard)IndexMaptoFxBuildHasher(was default SipHash) — interner is hit on every type translationcharon/src/ast/hash_cons.rs4c0a9eb9TypeMap'sHashMap<TypeId, _>toFxHashMap—TypeIdalready has full entropy, SipHash is pure overheadcharon/src/common.rs4c0a9eb9HashMap/HashSettoFxHashMap/FxHashSetin the translation context (coversid_map,reverse_id_map,cached_names,processed,file_to_id, etc.)charon/src/bin/charon-driver/translate/translate_ctx.rs4c0a9eb9FullDef::{Const,AssocConst}.value— eagerly building it ran rustc's const-interpreter for every const item, but the field was only consumed in one fallback path (MIR missing). StoreNone, evaluate on demand.charon/src/bin/charon-driver/hax/types/new/full_def.rs,charon/src/bin/charon-driver/translate/translate_bodies.rs65e8f6b2-Z threads) at translation time, set to(available_cores / 2).clamp(1, 8). Parallelisespar_hir_body_owners(typeck + const-eval — the dominant ~60% cost)charon/src/bin/charon-driver/driver.rs028959faOf course, I have reviewed this myself and nothing jumps out as insane to me