Add post-mono MIR optimizations#156858
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add post-mono MIR optimizations
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (c40ae76): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 11.7%, secondary 2.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.7%, secondary 2.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 510.282s -> 521.256s (2.15%) |
73c6c56 to
c885a8d
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add post-mono MIR optimizations
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (0a3713a): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 7.3%, secondary 1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.0%, secondary 4.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 510.282s -> 514.101s (0.75%) |
|
Try build cancelled. Cancelled workflows: |
This comment has been minimized.
This comment has been minimized.
17bf2a0 to
92dcb6a
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add post-mono MIR optimizations
This comment has been minimized.
This comment has been minimized.
|
@bors try cancel |
|
Try build cancelled. Cancelled workflows: |
This comment has been minimized.
This comment has been minimized.
6c4bb91 to
9aed921
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add post-mono MIR optimizations
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (4bf97ca): comparison URL. Overall result: ❌ regressions - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 8.1%, secondary 1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.8%, secondary -0.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.2%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 508.631s -> 519.036s (2.05%) |
|
The Clippy subtree was changed cc @rust-lang/clippy Some changes occurred in coverage instrumentation. cc @Zalathar Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt This PR changes MIR cc @oli-obk, @RalfJung, @JakobDegen, @vakaras Some changes occurred in compiler/rustc_attr_parsing cc @jdonszelmann, @JonathanBrouwer The Cranelift subtree was changed cc @bjorn3 Some changes occurred in compiler/rustc_hir/src/attrs |
|
Last perf run halves the max-rss regression, I'm not sure how much I can do. I welcome ideas. |
|
Some instruction regressions are time improvements. Most others that I looked at have most of their regression in LLVM, which can be anything like
Pretty sure regex has benchmarks, maybe run those with the two rustcs used in the latest perf run? |
|
I don't think a few percent of icount is that concerning next to a 25% increase in memory use of optimized builds. I think we've both tried to massage the implementation to get that number down and I'm still shocked by how large it is despite our efforts. Do we have a memory profiling strategy that could identify the cause? At work I've used strace or perf's syscall tracing to find mmap calls then converted them into the inferno crate's folded stacks format. I'm not sure I've tried that on the compiler, maybe it works? |
View all comments
This is mostly a rebase of #131650 by @saethlin.
MIR optimizations are limited since they run on polymorphic code. They cannot know of all types nor of their layout.
To work around this limitation @saethlin added a MIR traversal which monomorphizes one the run (#121421). We also already have a pass #139088 which is explicitly waiting for post-mono MIR passes to happen.
This PR creates a
build_codegen_mirquery. That query has a peculiarSteal<Cow<'tcx, Body<'tcx>>>return type. This allows reusingoptimized_mirwhen the body is already monomorphic, and also to free memory when we need to clone it. With this device we still have a sizeable max-rss regression.All this allows to remove just-in-time monomorphization from codegen code. Follow-up PRs can try migrating transforms that happen at codegen time to a post-mono MIR pass.