-
Notifications
You must be signed in to change notification settings - Fork 76
Frequency partitioning and FOR encoding rebased & synced with nimble/main #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
David-C-L
wants to merge
30
commits into
facebookincubator:main
Choose a base branch
from
David-C-L:freq-part-and-for-synced
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
f06434a
encodings: add frequency partitioning to encoding identifiers
David-C-L 16e4559
fix compilation issues wrt mem pool in FreqPartitionEncoding
David-C-L 81b1dd7
encodings/tests: align encoding tests with config templatisation and …
David-C-L a415df6
extend flatbuffers finding in CMakeLists
David-C-L 9d1601a
encoding/tests: update import name to FrequencyPartitionEncoding.hpp
David-C-L f9e5589
fix decode dictionary indexing
David-C-L b50d24c
update ForEncoding to use 64bit and include in nimble framework
David-C-L dc25009
add tests for ForEncoding with selective and bulk reads
David-C-L aa268ac
remove unnecessary comments
David-C-L f7fc4dc
refactor encoding selection to use zstd when meta internal is disabled
David-C-L 42e5597
restore metainternal use in tests instead of zstd
David-C-L 7052d4a
update for and freq interfaces and tests to use string buffer factory
David-C-L e537fb3
fix conflicting NUMERIC macros in EncodingFactor
David-C-L 6602782
parameterise forward defs of encoding type traits
David-C-L b64ceb9
add string buffer factory default initialiser for Freq and For tests
David-C-L d5a5461
fix compilation bug in VeloxReaderTests comparing non-numeric to int 0
David-C-L c5a06b0
encodings: thread options through factory and prefix serialization
David-C-L 0df6fd6
tests: make optional list/map literals explicit in selective reader t…
David-C-L 349c885
[encodings/FreqPart] replace for loop vector assignment with memcpy
David-C-L 8f75f1c
[encodings/FreqPart] replace value-to-tier assignments via vectors wi…
David-C-L bb9a094
[encodings/FreqPart] add pre-allocation for serialised Dict and Key s…
David-C-L b798880
[encodings/FreqPart] refactor encoding buffer creation to use options…
David-C-L 610ecc2
[encodings/FOR] refactor encoding buffer creation to use options factory
David-C-L 825c731
[encodings/FOR] implement bulk decode for residuals
David-C-L b53cf5a
[encodings/tests] remove redundant EncodingTypeTraits system
David-C-L 5a3e411
[velox/tests] reorder parameters of velox writer test initialisation …
David-C-L 7c7ccef
[encoding/tests] add guard for pool initialisation to avoid test setu…
David-C-L 8289202
[encodings/Encoding] add frequencyPartitionIndex var to distinguish b…
David-C-L 442e724
[encodings/FPE] extend FPE to support indexes reconstituting the orig…
David-C-L 56972bc
[encodings/tests/FPE] add FPE tests for each new index type: PerTierB…
David-C-L File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@David-C-L Can we see if we can achieve better performance without re-ordering the rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row reordering is to enable efficient value-granularity random access. Without reordering, the encoding would be limited to O(n) bulk decoding (similar to huffman encoding) due to the variable-sized keys. We did explore some indexes (they should be explained in the initial PR description) that could be used as a view for interfacing with the original order, so you get the benefit of reordering for random access while allowing access through the original ordering.
Do you think it's worth implementing these indexes as an option for the encoding?