Frequency partitioning and FOR encoding rebased & synced with nimble/main by David-C-L · Pull Request #636 · facebookincubator/nimble

David-C-L · 2026-04-03T01:36:12Z

TPCH SF10 Compression Rates per Column

freq-part-and-FOR-compression-rates-TPCH-SF10

- Bold bars indicate encodings that allow value-granularity random access - Frequency Partitioning efficacy depends on the value-type-size in bytes and the number of unique values (along with the distribution of values, which is roughly uniform for TPC-H). These plots show frequency partition encoding is effective on the same columns as dictionary encoding (for the reason that the num_unique_values < 2^(num_bits_for_type), allowing keys to be small). However, the frequency partitioning improves on dictionary encoding when many frequent values can be given a smaller key size than dictionary encoding would apply, as can be seen most prominently with **l_suppkey** and **l_linenumber**. - FOR encodings (PFORDelta and TurboPFOR here) are particularly effective when the domain of values covers a small range within frames (e.g. monotonically increasing values, or clustered values). Their efficacy is most prominent with **l_orderkey** where values are large but exist within ranges for different order batches (clustering).

Moderate number of unique int32 values following Zipfian distribution with varying alpha

freq-part-order-preserving-index-overhead-synthetic-zipfian-int32

- Table at the bottom indicates the properties of each encoding: the top row shows the encoding supports value-granularity random-access, and the bottom row shows if the encoding preserves the initial order of the value stream - Frequency Partitioning is as effective as dictionary encoding for datasets with little bias in their distribution (i.e. Zipfian with alpha = 0 or 0.5), since there is little difference in the frequency of values to exploit assigning smaller keys. For more biased datasets (alpha >= 1.0) is highly effective, maintaining competition with Zstd and OpenZL (encoding level 2). This is due to being able to assign small keys to very frequent values, which cover large swathes of the dataset. As the initial value-stream order may be important for some rec-sys training, each frequency partitioning bar contains the overhead for the most compressed order-preserving index. - FOR encodings (PFORDelta and TurboPFOR here), perform poorly on this dataset when little bias is present due to the order of values being random, meaning there is no common reference point to exploit for a given frame. As more bias is introduced, however, fewer values comprise larger proportions of the dataset and allow frames to contain many of the same values, increasing the probability of computing small residuals (especially a 0), and so the encoding offers better compression -- albeit not as effective as the other schemes since little ordering is present in the dataset.

Moderate number of unique int32 values -- Order-preserving index overhead

freq-part-and-for-compression-rates-synthetic-zipfian-int32

- The plot shows 4 different strategies for preserving the order of tuples (5 columns, so 5 values per tuple) that have been shuffled across 5 different partitions. Given an original index **x** for the tuple **t**, each order-preserving index stores or allows for the computation of the partition in which **t** is stored (**p**), and the index in **p** where **t** can be found. - The full global index simply stores an int32_t value for the partition index and the index in the partition for each tuple, along with a roaring bitmap for fast lookups - The RLE full global index performs RLE compression on each component of the full global index - The optimised index drops the int32_t value for the index in the partition, and instead calculates this using the popcount of the roaring bitmap - The RLE optimised index performs RLE compression on each component of the optimised index - The plot shows each index's total size, the size relative to the full table, and the lookup time complexity.

Comprehensive Encode, Decode, Compression, Memory use for Frequency Partition and FOR on simple Zipfian data

This PR also contains some changes for compatibility with Clang++-16.

…add frequency partition encoding type getters for test utils

…ests

meta-cla · 2026-04-03T01:36:18Z

Hi @David-C-L!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-04-03T02:08:00Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

srsuryadev · 2026-04-03T20:42:00Z

Thank you, @David-C-L! Will review and update cc: @zzhao0

meta-codesync · 2026-04-03T20:45:37Z

@srsuryadev has imported this pull request. If you are a Meta employee, you can view this in D99487883.

srsuryadev

Thanks for the initial version @David-C-L, added some initial comments

srsuryadev · 2026-04-08T07:53:57Z

+  for (uint32_t i = 0; i < rowCount; ++i) {
+    output[i] = decodeValue(currentRow_ + i);
+  }
+


We can see if we can explore bulk processing here, but need not to be in this PR though! For now, it is okay

I think the bulk processing shouldn't be too hard to implement. I'll give it a quick go and if it's not too much I'll add it to the PR

I've created a decodeRange function to implement bulk processing in 825c731

Thank you! Can we update the description with the decode time comparison as well

srsuryadev · 2026-04-08T09:24:16Z

  // operations for efficient random access.
  Prefix = 11,
+  // Partitions data by value frequency. Frequent values get shorter bit-width
+  // codes. Rows are reordered to group values with same code length.


@David-C-L Can we see if we can achieve better performance without re-ordering the rows?

The row reordering is to enable efficient value-granularity random access. Without reordering, the encoding would be limited to O(n) bulk decoding (similar to huffman encoding) due to the variable-sized keys. We did explore some indexes (they should be explained in the initial PR description) that could be used as a view for interfacing with the original order, so you get the benefit of reordering for random access while allowing access through the original ordering.

Do you think it's worth implementing these indexes as an option for the encoding?

…th FastMap lookup

…tring_views

… factory

…to align with C20

…p breaking

srsuryadev · 2026-04-15T03:12:45Z

Hi @David-C-L, Thank you! The PR is getting into a better shape! can we update the decode time and memory overhead as well along with the compression ratio in the same benchmark which you have used in the description? Thank you

srsuryadev · 2026-05-06T13:17:54Z

+void FrequencyPartitionEncoding<T>::readWithVisitor(
+    V& visitor,
+    ReadWithVisitorParams& params) {
+  detail::readWithVisitorSlow(visitor, params, nullptr, [&] {


This is fine for initial implementation, but we can try or will follow up if needed for readWithVisitorFast

srsuryadev · 2026-05-06T13:18:50Z

+}
+
+template <typename T>
+uint32_t FrequencyPartitionEncoding<T>::getTierForRow(uint32_t rowIndex) const {


We can try some search optimizations here if possible.

…etween the index types

…inal data order: PerTierBitmaps, TierTagArray, EliasFano

…itmaps, TierTagArray, EliasFano

David-C-L added 18 commits April 2, 2026 23:09

encodings: add frequency partitioning to encoding identifiers

f06434a

fix compilation issues wrt mem pool in FreqPartitionEncoding

16e4559

encodings/tests: align encoding tests with config templatisation and …

81b1dd7

…add frequency partition encoding type getters for test utils

extend flatbuffers finding in CMakeLists

a415df6

encoding/tests: update import name to FrequencyPartitionEncoding.hpp

9d1601a

fix decode dictionary indexing

f9e5589

update ForEncoding to use 64bit and include in nimble framework

b50d24c

add tests for ForEncoding with selective and bulk reads

dc25009

remove unnecessary comments

aa268ac

refactor encoding selection to use zstd when meta internal is disabled

f7fc4dc

restore metainternal use in tests instead of zstd

42e5597

update for and freq interfaces and tests to use string buffer factory

7052d4a

fix conflicting NUMERIC macros in EncodingFactor

e537fb3

parameterise forward defs of encoding type traits

6602782

add string buffer factory default initialiser for Freq and For tests

b64ceb9

fix compilation bug in VeloxReaderTests comparing non-numeric to int 0

d5a5461

encodings: thread options through factory and prefix serialization

c5a06b0

tests: make optional list/map literals explicit in selective reader t…

0df6fd6

…ests

David-C-L marked this pull request as ready for review April 3, 2026 01:40

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 3, 2026

srsuryadev reviewed Apr 8, 2026

View reviewed changes

srsuryadev requested a review from zzhao0 April 8, 2026 07:54

srsuryadev reviewed Apr 8, 2026

View reviewed changes

srsuryadev requested a review from xiaoxmeng April 8, 2026 18:17

David-C-L added 2 commits April 13, 2026 15:44

[encodings/FreqPart] replace for loop vector assignment with memcpy

349c885

[encodings/FreqPart] replace value-to-tier assignments via vectors wi…

8f75f1c

…th FastMap lookup

David-C-L added 7 commits April 13, 2026 16:00

[encodings/FreqPart] add pre-allocation for serialised Dict and Key s…

bb9a094

…tring_views

[encodings/FreqPart] refactor encoding buffer creation to use options…

b798880

… factory

[encodings/FOR] refactor encoding buffer creation to use options factory

610ecc2

[encodings/FOR] implement bulk decode for residuals

825c731

[encodings/tests] remove redundant EncodingTypeTraits system

b53cf5a

[velox/tests] reorder parameters of velox writer test initialisation …

5a3e411

…to align with C20

[encoding/tests] add guard for pool initialisation to avoid test setu…

7c7ccef

…p breaking

srsuryadev reviewed May 6, 2026

View reviewed changes

srsuryadev requested a review from pedroerp May 6, 2026 16:55

David-C-L added 3 commits May 7, 2026 18:07

[encodings/Encoding] add frequencyPartitionIndex var to distinguish b…

8289202

…etween the index types

[encodings/FPE] extend FPE to support indexes reconstituting the orig…

442e724

…inal data order: PerTierBitmaps, TierTagArray, EliasFano

[encodings/tests/FPE] add FPE tests for each new index type: PerTierB…

56972bc

…itmaps, TierTagArray, EliasFano

Conversation

David-C-L commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TPCH SF10 Compression Rates per Column

Moderate number of unique int32 values following Zipfian distribution with varying alpha

Moderate number of unique int32 values -- Order-preserving index overhead

Comprehensive Encode, Decode, Compression, Memory use for Frequency Partition and FOR on simple Zipfian data

Uh oh!

meta-cla Bot commented Apr 3, 2026

Action Required

Process

Uh oh!

meta-cla Bot commented Apr 3, 2026

Uh oh!

srsuryadev commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 3, 2026

Uh oh!

srsuryadev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srsuryadev commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

David-C-L commented Apr 3, 2026 •

edited

Loading

srsuryadev commented Apr 3, 2026 •

edited

Loading