feat(riscv): optimize divisions by powers of 2#982
Open
regehr wants to merge 14 commits into
Open
Conversation
# Conflicts: # Test/Passes/InstructionSelection/RISCV64/fastntt.mlir
Contributor
There was a problem hiding this comment.
VeIR Benchmarks
Details
| Benchmark suite | Current: 72aa31c | Previous: fc5ee5b | Ratio |
|---|---|---|---|
add-fold-worklist/create |
2193500 ns (± 397943) |
2261000 ns (± 44489) |
0.97 |
add-fold-worklist/rewrite |
4019000 ns (± 696363) |
4041000 ns (± 68252) |
0.99 |
add-fold-worklist-local/create |
2215000 ns (± 89595) |
2183000 ns (± 30573) |
1.01 |
add-fold-worklist-local/rewrite |
3287000 ns (± 66605) |
3324000 ns (± 56690) |
0.99 |
add-zero-worklist/create |
2195000 ns (± 75474) |
2291000 ns (± 84009) |
0.96 |
add-zero-worklist/rewrite |
2626000 ns (± 67480) |
2630000 ns (± 59551) |
1.00 |
add-zero-reuse-worklist/create |
1884000 ns (± 88188) |
1845000 ns (± 79748) |
1.02 |
add-zero-reuse-worklist/rewrite |
2084000 ns (± 75573) |
2149000 ns (± 68678) |
0.97 |
mul-two-worklist/create |
2167000 ns (± 69735) |
2213000 ns (± 59036) |
0.98 |
mul-two-worklist/rewrite |
5608000 ns (± 127213) |
5752000 ns (± 188041) |
0.97 |
add-fold-forwards/create |
2226000 ns (± 69723) |
2198000 ns (± 97548) |
1.01 |
add-fold-forwards/rewrite |
3095000 ns (± 37265) |
3057000 ns (± 48925) |
1.01 |
add-zero-forwards/create |
2141000 ns (± 71318) |
2177000 ns (± 65592) |
0.98 |
add-zero-forwards/rewrite |
1990000 ns (± 90765) |
1976000 ns (± 49558) |
1.01 |
add-zero-reuse-forwards/create |
1791000 ns (± 60251) |
1813000 ns (± 66624) |
0.99 |
add-zero-reuse-forwards/rewrite |
1562000 ns (± 22590) |
1535000 ns (± 71922) |
1.02 |
mul-two-forwards/create |
2181000 ns (± 92855) |
2249000 ns (± 59277) |
0.97 |
mul-two-forwards/rewrite |
3661000 ns (± 184634) |
3718000 ns (± 43701) |
0.98 |
add-zero-reuse-first/create |
1859000 ns (± 96719) |
1848500 ns (± 77966) |
1.01 |
add-zero-reuse-first/rewrite |
8000 ns (± 1351) |
8000 ns (± 1982) |
1 |
add-zero-lots-of-reuse-first/create |
1867000 ns (± 34501) |
1872000 ns (± 26678) |
1.00 |
add-zero-lots-of-reuse-first/rewrite |
779000 ns (± 18582) |
837000 ns (± 34120) |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
| `fold (srl (sra X, Y), 31) -> (srl X, 31)`. | ||
| https://github.com/llvm/llvm-project/blob/2e87cf8c2b8ec6453ccfa7e448d5b33f1d71a2ca/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L11628-L11633 | ||
| -/ | ||
| theorem srl_sra_signbit_refinement {x shamt : LLVM.Int 64} : |
Contributor
There was a problem hiding this comment.
is there a reason why we don't state this as equality at the Reg type?
tobiasgrosser
approved these changes
Jul 4, 2026
tobiasgrosser
left a comment
Collaborator
There was a problem hiding this comment.
Amazing. This goes in the right direction. I let @luisacicolini do the final review.
| theorem udivPow2_refinement {x : LLVM.Int 64} (k : BitVec 6) : | ||
| (Data.LLVM.Int.udiv x (LLVM.Int.val ((1#64) <<< k))) ⊒ | ||
| (RISCV.Reg.toInt (Data.RISCV.srli k (LLVM.Int.toReg x)) 64) := by | ||
| sorry -- bv_decide needs a non-default timeout (120s) to close this goal |
Collaborator
There was a problem hiding this comment.
For this we need:
theorem udiv_one_shl {w₁ w₂ : Nat} (x : BitVec w₁) (k : BitVec w₂) :
x / 1#w₁ <<< k = x >>> k := by
rw [BitVec.shiftLeft_eq', BitVec.ushiftRight_eq', ← BitVec.twoPow_eq]
rcases Nat.lt_or_ge k.toNat w₁ with hk | hk
· exact BitVec.udiv_twoPow_eq_of_lt hk
· have htw : BitVec.twoPow w₁ k.toNat = 0#w₁ := BitVec.eq_of_toNat_eq (by
rw [BitVec.toNat_twoPow_of_le hk, BitVec.toNat_ofNat, Nat.zero_mod])
rw [htw, BitVec.udiv_zero, BitVec.ushiftRight_eq_zero hk]Which makes sense.
Collaborator
There was a problem hiding this comment.
I suggest to merge this PR when @luisacicolini is happy then and fix the sorries one-by-one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM backends have sophisticated lowerings to avoid expensive division instructions.
this PR gives us the trivial lowering from udiv by power-of-2 to a shift, and then also the less trivial lowering for sdiv by power-of-2. there are also some specific rewrites for the
exactvariants.everything is a pretty straightforward port from the LLVM backend. our output matches LLVM's for all cases I looked at.
I also ported a nice little optimization that removes one instruction from the sdiv-by-2 case.
these fire multiple times in fastntt!
I've sorried out several slow proofs, division is hard for bitvector solvers