Skip to content

feat(riscv): optimize divisions by powers of 2#982

Open
regehr wants to merge 14 commits into
mainfrom
regehr/riscv-div-opt
Open

feat(riscv): optimize divisions by powers of 2#982
regehr wants to merge 14 commits into
mainfrom
regehr/riscv-div-opt

Conversation

@regehr

@regehr regehr commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

LLVM backends have sophisticated lowerings to avoid expensive division instructions.

this PR gives us the trivial lowering from udiv by power-of-2 to a shift, and then also the less trivial lowering for sdiv by power-of-2. there are also some specific rewrites for the exact variants.

everything is a pretty straightforward port from the LLVM backend. our output matches LLVM's for all cases I looked at.

I also ported a nice little optimization that removes one instruction from the sdiv-by-2 case.

these fire multiple times in fastntt!

I've sorried out several slow proofs, division is hard for bitvector solvers

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VeIR Benchmarks

Details
Benchmark suite Current: 72aa31c Previous: fc5ee5b Ratio
add-fold-worklist/create 2193500 ns (± 397943) 2261000 ns (± 44489) 0.97
add-fold-worklist/rewrite 4019000 ns (± 696363) 4041000 ns (± 68252) 0.99
add-fold-worklist-local/create 2215000 ns (± 89595) 2183000 ns (± 30573) 1.01
add-fold-worklist-local/rewrite 3287000 ns (± 66605) 3324000 ns (± 56690) 0.99
add-zero-worklist/create 2195000 ns (± 75474) 2291000 ns (± 84009) 0.96
add-zero-worklist/rewrite 2626000 ns (± 67480) 2630000 ns (± 59551) 1.00
add-zero-reuse-worklist/create 1884000 ns (± 88188) 1845000 ns (± 79748) 1.02
add-zero-reuse-worklist/rewrite 2084000 ns (± 75573) 2149000 ns (± 68678) 0.97
mul-two-worklist/create 2167000 ns (± 69735) 2213000 ns (± 59036) 0.98
mul-two-worklist/rewrite 5608000 ns (± 127213) 5752000 ns (± 188041) 0.97
add-fold-forwards/create 2226000 ns (± 69723) 2198000 ns (± 97548) 1.01
add-fold-forwards/rewrite 3095000 ns (± 37265) 3057000 ns (± 48925) 1.01
add-zero-forwards/create 2141000 ns (± 71318) 2177000 ns (± 65592) 0.98
add-zero-forwards/rewrite 1990000 ns (± 90765) 1976000 ns (± 49558) 1.01
add-zero-reuse-forwards/create 1791000 ns (± 60251) 1813000 ns (± 66624) 0.99
add-zero-reuse-forwards/rewrite 1562000 ns (± 22590) 1535000 ns (± 71922) 1.02
mul-two-forwards/create 2181000 ns (± 92855) 2249000 ns (± 59277) 0.97
mul-two-forwards/rewrite 3661000 ns (± 184634) 3718000 ns (± 43701) 0.98
add-zero-reuse-first/create 1859000 ns (± 96719) 1848500 ns (± 77966) 1.01
add-zero-reuse-first/rewrite 8000 ns (± 1351) 8000 ns (± 1982) 1
add-zero-lots-of-reuse-first/create 1867000 ns (± 34501) 1872000 ns (± 26678) 1.00
add-zero-lots-of-reuse-first/rewrite 779000 ns (± 18582) 837000 ns (± 34120) 0.93

This comment was automatically generated by workflow using github-action-benchmark.

`fold (srl (sra X, Y), 31) -> (srl X, 31)`.
https://github.com/llvm/llvm-project/blob/2e87cf8c2b8ec6453ccfa7e448d5b33f1d71a2ca/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L11628-L11633
-/
theorem srl_sra_signbit_refinement {x shamt : LLVM.Int 64} :

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why we don't state this as equality at the Reg type?

@tobiasgrosser tobiasgrosser left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing. This goes in the right direction. I let @luisacicolini do the final review.

theorem udivPow2_refinement {x : LLVM.Int 64} (k : BitVec 6) :
(Data.LLVM.Int.udiv x (LLVM.Int.val ((1#64) <<< k))) ⊒
(RISCV.Reg.toInt (Data.RISCV.srli k (LLVM.Int.toReg x)) 64) := by
sorry -- bv_decide needs a non-default timeout (120s) to close this goal

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this we need:

theorem udiv_one_shl {w₁ w₂ : Nat} (x : BitVec w₁) (k : BitVec w₂) :
    x / 1#w₁ <<< k = x >>> k := by
  rw [BitVec.shiftLeft_eq', BitVec.ushiftRight_eq', ← BitVec.twoPow_eq]
  rcases Nat.lt_or_ge k.toNat w₁ with hk | hk
  · exact BitVec.udiv_twoPow_eq_of_lt hk
  · have htw : BitVec.twoPow w₁ k.toNat = 0#w₁ := BitVec.eq_of_toNat_eq (by
      rw [BitVec.toNat_twoPow_of_le hk, BitVec.toNat_ofNat, Nat.zero_mod])
    rw [htw, BitVec.udiv_zero, BitVec.ushiftRight_eq_zero hk]

Which makes sense.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to merge this PR when @luisacicolini is happy then and fix the sorries one-by-one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants