Minor performance improvements by giltho · Pull Request #1 · soteria-tools/obol

giltho · 2026-05-25T16:35:54Z

I got Claude to write performance improvements for Obol. It tried like 10 different things over 1h30 and only 2 of them make a vague improvement (5% total). It says the low-hanging fruits are really within Charon (and were already implemented in AeneasVerif/charon#1173.

It's not much, and I'm not sure we can do much better because, apparently, 55% of runtime is spent in const-eval.

Here's the summary written by Claude, I went through this and it doesn't sound banana and 5% is always good.

Summary

Two small caches in translate_raw_span that together cut ~4.6% off translation time on a real-world workload.

Spans are translated once per MIR statement, terminator, local, and item — so this is one of the hottest paths in the driver. Each call was
re-running path normalization, FileName hashing, and SourceMap::lookup_char_pos for spans we'd already seen.

ddc49d4 — cache FileId by SourceFile stable id. translate_filename walks path components and strips sysroot/cargo prefixes;
register_file then hashes a FileName. Caching on rustc's StableSourceFileId (already 1:1 with the source file) skips all of that after
the first hit per file. Falls through to register_file on miss, so dedup semantics are unchanged.
a9bbacf — memoize translate_raw_span by ty::Span. Many statements share the same expansion span (inlining, macros). Caching the
full SpanData skips both the file lookup and the two lookup_char_pos calls.

Measurements

signalapp/SparsePostQuantumRatchet, hyperfine with --prepare 'touch src/lib.rs', 20 runs side-by-side:

Version	Mean	vs baseline
`main`	6.232 s ± 0.068	1.00×
after `ddc49d4`	~5.94 s	1.05×
after `a9bbacf` (this branch)	5.944 s ± 0.056	1.046× faster (~3σ)

Profile confirms the mechanism: translate_span_from_smir inclusive drops from 4.92% → 0.43% across the two commits.

Spans are translated once per statement, terminator, local, and item, and each call walked path components and hashed a `FileName` to dedup the file registration. Almost every hit lands on a handful of source files, so caching by rustc's `StableSourceFileId` lets us skip the work after the first miss per file. Measured on signalapp/SparsePostQuantumRatchet, hyperfine 10 runs: 6.355 s → 6.061 s (1.05× faster).

Many MIR statements share the same expansion span (especially within inlined or macro-generated code), and each call ran `lookup_char_pos` twice and re-resolved the source file. Caching `ty::Span -> SpanData` short-circuits that. Profile: `translate_span_from_smir` inclusive drops from 1.02% to 0.43%. Hyperfine (20 runs, SpQR): 5.919 s -> 5.885 s.

N1ark · 2026-05-25T17:29:44Z

nice

giltho added 2 commits May 25, 2026 15:12

N1ark merged commit d8177bd into main May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor performance improvements#1

Minor performance improvements#1
N1ark merged 2 commits into
mainfrom
perf

giltho commented May 25, 2026

Uh oh!

N1ark commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

giltho commented May 25, 2026

Summary

Measurements

Uh oh!

N1ark commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants