Skip to content

perf : Optimize count distinct#21456

Open
coderfender wants to merge 10 commits intoapache:mainfrom
coderfender:optimize_count_distinct
Open

perf : Optimize count distinct#21456
coderfender wants to merge 10 commits intoapache:mainfrom
coderfender:optimize_count_distinct

Conversation

@coderfender
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 8, 2026
@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented Apr 8, 2026

benchmark results :

count_distinct i16 bitmap                      1.00      3.3±0.43µs        ? ?/sec    23.87    78.4±0.84µs        ? ?/sec
count_distinct i8 bitmap                       1.00      2.3±0.49µs        ? ?/sec    7.13     16.7±0.55µs        ? ?/sec
count_distinct u16 bitmap                      1.00      3.1±0.18µs        ? ?/sec    25.45    78.8±3.92µs        ? ?/sec
count_distinct u8 bitmap                       1.00      2.3±0.34µs        ? ?/sec    7.37     16.9±0.14µs        ? ?/sec

It seems like we are 25x faster for u16 bitmap based accumulators (or I am sleepy :) )

@Dandandan
Copy link
Copy Markdown
Contributor

I think we can do the same for 16 bit types, it is just 65_536 bytes 8192 if we use a bitmap.

@Dandandan
Copy link
Copy Markdown
Contributor

Oh wait, you're already doing that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants