Improve RGB to luma conversion logic by RunDevelopment · Pull Request #2988 · image-rs/image

RunDevelopment · 2026-05-25T18:37:31Z

Changes:

Move RGB -> Luma conversion into sealed trait.
Change u8 behavior. Luma is now correctly rounded.
Remove trait bound Rgb<T>: Enlargeable and Rgba<T>: Enlargeable.
Expand convert benchmark

My main goal was to make the RGB to luma conversion faster. To that end, I added optimized implementations for u8 and f32. However, this didn't work out. The benchmarks show no change at all. Frankly, I don't understand why. The new implementations are doing purely less work and should be close to optimal.

u8 CE: Both the old generic and optimized implementations do not vectorize at all. The compiler also loads each byte individually. The only difference is that the optimized version does everything in 32-bit registers, while the generic version uses 64-bit registers to perform the division with a multiply shift (which in turn leads to the compiler unrolling the optimized version once).
f32 CE: The generic version uses f64 to do all calculations while the optimized uses f32. The SIMD for both is bad. The f32 version is actually super easy to vectorize well (just do rgba * [W_R, W_G, W_B, 0] and then sum), but the compiler doesn't do this.

In any case, even without a perf win, removing the Enlargeable bound and correct rounding for u8 should still be a win.

awxkee · 2026-05-25T21:24:17Z

u8 CE: Both the old generic and optimized implementations do not vectorize at all. The compiler also loads each byte individually. The only difference is that the optimized version does everything in 32-bit registers, while the generic version uses 64-bit registers to perform the division with a multiply shift (which in turn leads to the compiler unrolling the optimized version once).

This code leads to pshufb which is available only from SSSE3, and actually from my view LLVM is not tuned to recognize that pattern well until AVX2, on AVX2 codegen actually looks normal, might be not perfect, but OK. And since your multiplier is huge it leads to requirement of 64-bit product ( to the compiler perspective ) what makes things even worse. Does that one make sense?

RunDevelopment · 2026-05-26T09:48:22Z

And since your multiplier is huge it leads to requirement of 64-bit product ( to the compiler perspective ) what makes things even worse. Does that one make sense?

OMG, you're right. When I modify your version to use my constants, the compiler emits 64-bit pmuludq. I didn't think that the compiler would be that bad at doing range analysis.

Unfortunately, your constants don't produce exact results. IIRC, the smallest shift for which correct constants exist is 22. Is manual SIMD the only answer here?

awxkee · 2026-05-26T10:13:20Z

Unfortunately, your constants don't produce exact results. IIRC, the smallest shift for which correct constants exist is 22. Is manual SIMD the only answer here?

Yeah, for constants that complex I don't think manual work is avoidable. Lowering precision is likely the only option. I guess it should find the "right" constant up to i16::MAX, with an i32 intermediate product somewhere in the range [i16::MAX, u16::MAX] which may not be exact, but the error is actually very small. You can see that this range already worse than (0,i16::MAX) but still reasonable https://godbolt.org/z/MjqhGdvhz

RunDevelopment added 3 commits May 24, 2026 12:54

Optimize RGB to Luma conversion

0836ce6

Expand luma cast bench

cdaf9a0

Minor changes

d71b210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve RGB to luma conversion logic#2988

Improve RGB to luma conversion logic#2988
RunDevelopment wants to merge 3 commits into
image-rs:mainfrom
RunDevelopment:opt-rgb-to-luma

RunDevelopment commented May 25, 2026

Uh oh!

awxkee commented May 25, 2026 •

edited

Loading

Uh oh!

RunDevelopment commented May 26, 2026

Uh oh!

awxkee commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RunDevelopment commented May 25, 2026

Uh oh!

awxkee commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RunDevelopment commented May 26, 2026

Uh oh!

awxkee commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awxkee commented May 25, 2026 •

edited

Loading