feat(cudf): Add GPU array access support for element_at, subscript, and get#17243
feat(cudf): Add GPU array access support for element_at, subscript, and get#17243ReemaAlzaid wants to merge 40 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@ReemaAlzaid this looks good. It certainly checks the boxes as far as a "reusable base class". Some of the helper functions seem a little long-winded, and maybe the array-specific stuff can be moved to a separate CU as this one is already too big. Can you also add a function alias for |
simoneves
left a comment
There was a problem hiding this comment.
Apologies for not really giving this a proper review the first time. It looks good, with a few minor nits, but I have three main concerns, and one question:
- Support for
INTEGERindex columns without casting up toBIGINT - Use scalar-index variant of
extract_list_elementinstead of materializing column - Is it possible to do what
validateIndicesdoes on GPU instead of CPU
Please also explain the policy variations.
| #include <cudf/datetime.hpp> | ||
| #include <cudf/fixed_point/fixed_point.hpp> | ||
| #include <cudf/hashing.hpp> | ||
| #include <cudf/lists/extract.hpp> |
There was a problem hiding this comment.
Are these other two includes actually needed?
| testUnaryFunction("abs(c0)", -5.5, 5.5); | ||
| } | ||
|
|
||
| TEST_F(CudfSimpleFilterProjectTest, elementAtArrayConstantIndex) { |
There was a problem hiding this comment.
Also feels like these tests can be merged somewhat, although I see that the differing policy behavior means that the tests aren't identical. Fine either way.
| } | ||
|
|
||
| const ArrayAccessPolicy policy_; | ||
| bool hasConstantIndex_{false}; |
There was a problem hiding this comment.
This flag seems superfluous given that constantIndex_ itself is a std::optional. The logic in the constructor could then perhaps be simplified a little.
|
|
||
| auto rawIndexColumn = | ||
| prepareRawIndexColumn(inputColumns, listsView.size(), stream, mr); | ||
| auto rawIndex64 = ensureInt64Indices(rawIndexColumn, stream, mr); |
There was a problem hiding this comment.
Why is this step (and corresponding function) needed? Surely prepareRawIndexColumn will always return an INT64 column, the same type as the scalar it's generated from?
There was a problem hiding this comment.
I'm assuming that extract_list_element with a live column can accept either an int64_t/BIGINT or int32_t/INTEGER column? Do you have a test for that?
There was a problem hiding this comment.
Yeah I have added the test coverages in FilterProjectTest for both converges int64_t and int32_t
|
@ReemaAlzaid not sure why this PR isn't running full CI. Wondering if you need more permissions, or if it just behaves differently for IBM folks? @majetideepak @kgpai ? |
It says "5 workflows awaiting approval". Perhaps that's why. |
Thanks @czentgr for fixing the permissions. CI now running. |
Build Impact AnalysisFull build recommended. Files outside the dependency graph changed:
These directories are not fully covered by the dependency graph. A full build is the safest option. Slow path • Graph generated from PR branch |
CI Failure Analysis
❌ Build with GCC / cudf-tests — TEST Failure View logsFailed test binary: Failed tests:
Failure details: Correlation with PR changes: These failures are directly caused by the PR. PR #17243 introduces new
Known issues:
Reproduce locally: # Build cuDF tests (requires NVIDIA GPU + cuDF)
./_build/release/velox/experimental/cudf/tests/velox_cudf_filter_project_test \
--gtest_filter="CudfSimpleFilterProjectTest.arrayAccessSupportedScalarElementTypes:\
CudfSimpleFilterProjectTest.arrayAccessSupportedComplexElementTypes:\
CudfSimpleFilterProjectTest.arrayAccessConstantArraySupportedTypes:\
CudfSimpleFilterProjectTest.subscriptConstantArrayVariableIndex"Recommended fixes:
|
|
|
||
| void SetUp() override { | ||
| previousMemoryResource_ = CudfConfig::getInstance().memoryResource; | ||
| CudfConfig::getInstance().memoryResource = "cuda"; |
There was a problem hiding this comment.
What is the reason for adding this?
There was a problem hiding this comment.
...and in FilterProjectTest.cpp...?
There was a problem hiding this comment.
I added that as a local test workaround for the default async memory resource path so I’ve removed it
| auto expected = makeNullableFlatVector<int32_t>({3, 6, std::nullopt, 7}); | ||
|
|
||
| facebook::velox::test::assertEqualVectors( | ||
| expected, evaluate("element_at(c0, -1)", makeRowVector({arrays}))); |
There was a problem hiding this comment.
Maybe overkill, but maybe add index-from-column versions of these tests too (both types)...?
|
|
||
| class CudfFunctionBaseTest : public velox::functions::test::FunctionBaseTest { | ||
| protected: | ||
| using velox::functions::test::FunctionBaseTest::evaluate; |
There was a problem hiding this comment.
Why is this needed? Surely it was resolving that function type just fine before, as it's an existing test base class and surely worked? Or did you uncover something more insidious? If so, please explain.
There was a problem hiding this comment.
Yeah, needed for these tests. Adding our own evaluate(...) here hides the parent class versions in C++. The array access tests now use one of those parent versions directly, so using just makes it visible again
There was a problem hiding this comment.
Please remove it, @simoneves will add a new function in this class to verify the CPU and GPU results, this PR has under review and will merge soon
There was a problem hiding this comment.
Please address this comment
| #include <cudf/transform.hpp> | ||
| #include <cudf/types.hpp> | ||
| #include <cudf/unary.hpp> | ||
| #include <cudf/utilities/error.hpp> |
There was a problem hiding this comment.
You removed one of the two. Isn't this one superfluous also? Apologies if not...
There was a problem hiding this comment.
Good catch, forgot to remove after moving the array access logic out
| return; | ||
| } | ||
|
|
||
| constantIndex_.emplace(); |
There was a problem hiding this comment.
What does this do, without any arguments? I mean, the nested std::optional is surely wrong anyway? (see added-earlier comment below) This will create the outer one, but not the inner one. I guess the inner one gets assigned below, but it still seems wrong unless I'm missing something...?
There was a problem hiding this comment.
My reasoning there was to represent 3 states index before, That said, I agree this is awkward and easy to misread. I’ve simplified it by removing ConstantIndex and changing this back to std::optional<int64_t> constantIndex_
| if constexpr (std::is_same_v<IndexType, int32_t>) { | ||
| return normalized; | ||
| } else { | ||
| return cudf::cast( |
There was a problem hiding this comment.
Can the actual final indices passed to extract_list_element ONLY be int32_t (and we just provide the BIGINT version of the function as a convenience) or can they also actually be int64_t with values above 2^32? (seems unlikely in practice, but even so)?
There was a problem hiding this comment.
ADDENDUM: OK, so it seems that array indices are indeed limited to cudf::size_type which is int32_t (surprisingly) so it makes sense that your code now generates final index values/columns as int32_t.
There was a problem hiding this comment.
ADDENDUM TO ADDENDUM: I am frankly very surprised that cudf::size_type is int32_t as that applies to row counts of columns too? (hence a 2B-row limit). I'm not surprised that array indices can't be full-64-bit, but presumably SQL allows for it, if TPC-DS requires that variant of subscript?
There was a problem hiding this comment.
Yeah this matches what I found. Since cudf::size_type is int32_t, the final indices passed to extract_list_element should be int32_t
We can still expose a BIGINT variant for SQL compatibility, but it needs to cast/validate before calling cuDF values outside int32_t can’t be represented anyway (also consistent with rapidsai/cudf#3958 (comment)).
I also checked TPC-DS (incl. Q70), and there’s no array/subscript usage, so it doesn’t require a BIGINT subscript variant
| mr); | ||
|
|
||
| if constexpr (std::is_same_v<IndexType, int64_t>) { | ||
| auto minIndex = cudf::numeric_scalar<int64_t>( |
There was a problem hiding this comment.
What is this block intended to do? Seems like it's checking that int64_t indices are within a range, and I thought perhaps that would be the int32_t range (such that the cast-back-to-int32_t later would be valid) but the minIndex and maxIndex are int64_t...?
There was a problem hiding this comment.
ADDENDUM: Sorry! I was reading it wrong. I see now that the actual VALUES are min/max of cudf::size_type which is int32_t, just then CAST to int64_t for the purpose of the comparison passes.
There was a problem hiding this comment.
Right, it’s just guarding the later cast to cudf::size_type. I renamed the locals to make that clearer
| }; | ||
|
|
||
| struct ConstantIndex { | ||
| std::optional<int64_t> value; |
|
The CI is blocked again. Not sure why. @czentgr can you work your magic again, please? Why didn't your first fix stick? |
|
Looking into the Velox CPU Your index manipulation and validation code looks a lot like what's in |
| const ArrayAccessPolicy& policy, | ||
| rmm::cuda_stream_view stream, | ||
| rmm::device_async_resource_ref mr) { | ||
| auto sizes = castSizes<IndexType>(sizesView, stream, mr); |
There was a problem hiding this comment.
Is the IndexType same with rawIndexView type?
There was a problem hiding this comment.
Yes. The caller guarantees this: INT8/INT16 inputs are first cast to INT32 before calling normalizeAndValidateIndicesTyped<int32_t>(); INT32 and INT64 inputs are dispatched directly to int32_t and int64_t
There was a problem hiding this comment.
Then do you need the template class IndexType? You can use rawIndexView.type()
There was a problem hiding this comment.
I've done a patch for that and used rawIndexView.type()
| VELOX_USER_FAIL("Array subscript index cannot be negative"); | ||
| } | ||
|
|
||
| std::unique_ptr<cudf::column> positiveNormalized; |
There was a problem hiding this comment.
update to cudf::column_view to avoid copy in else branch, please notice you need to reserve the original column to avoid memory issue
auto positiveNormalized = rawIndexView;
if (policy.indexStartsAtOne)
| #include "velox/expression/FunctionSignature.h" | ||
|
|
||
| #include <initializer_list> | ||
| #include <string> |
There was a problem hiding this comment.
Please check the header
There was a problem hiding this comment.
Could you clarify what issue you see in the header? I checked CommonFunctions.h, and it already includes the headers for the types it uses directly: <initializer_list>, <string>, and <vector>
There was a problem hiding this comment.
These headers are already included in "velox/expression/FunctionSignature.h"
There was a problem hiding this comment.
I'll remove them then thanks
|
|
||
| class CudfFunctionBaseTest : public velox::functions::test::FunctionBaseTest { | ||
| protected: | ||
| using velox::functions::test::FunctionBaseTest::evaluate; |
There was a problem hiding this comment.
Please remove it, @simoneves will add a new function in this class to verify the CPU and GPU results, this PR has under review and will merge soon
|
@ReemaAlzaid the test refactor that @jinchengchenghh mentioned above is 0ec817a from #16913 which will indeed hopefully land very soon. |
|
Please resolve this comment, #17243 (comment), add the test for all the data type of array vector and index data type, idealy, the tests should cover all the paths, that's also what CPU does, for GPU project, this is a young project, so we make sure to cover the functionality path |
…array # Conflicts: # velox/experimental/cudf/tests/FilterProjectTest.cpp
|
Where are we on this one? Build errors in both versions of @jinchengchenghh did you have any other concerns? @devavret @karthikeyann or anyone else? |
|
@ReemaAlzaid please refactor (at least the cudf version of) FilterProjectTest to use the |
Done the refactor in both files |
jinchengchenghh
left a comment
There was a problem hiding this comment.
Please solve the red CI
| const RowVectorPtr& input, | ||
| const std::string& message) { | ||
| auto exprSet = compileExpression(expr, input->rowType()); | ||
| VELOX_ASSERT_THROW(evaluate(*exprSet, input), message); |
There was a problem hiding this comment.
VELOX_ASSERT_THROW(evaluate(expr, input), message); there is API that accepts string expression
the tests are passing now |
| protected: | ||
| using velox::functions::test::FunctionBaseTest::evaluate; | ||
|
|
||
| void assertExpressionMatchesCpu( |
There was a problem hiding this comment.
Please do a refactor here, remove it https://github.com/facebookincubator/velox/pull/16913/changes#diff-b59a27a01d4861ece67ec4f23cdf2b34d41419d44d1307806b1348bda2471102R1482
|
|
||
| class CudfFunctionBaseTest : public velox::functions::test::FunctionBaseTest { | ||
| protected: | ||
| using velox::functions::test::FunctionBaseTest::evaluate; |
There was a problem hiding this comment.
Please address this comment
Summary
This PR adds GPU support for array access functions (
element_at,subscript, and Sparkget) in the cuDF expression evaluator. Until now, these functions were not handled on the GPU, so queries using them would fall back. To match Velox semantics, the change adds shared array access helpers that validate and normalize indices before calling cuDF list extraction.What changed
element_atandsubscript.get.ArrayAccessFunctionandArrayAccessPolicy) to centralize index validation and normalization before cuDF extraction.Implementation notes
element_atuses 1-based indexing, supports negative indices from the end of the array, and returns null for out-of-bound indices.subscriptuses 1-based indexing and raises on zero, negative, or out-of-bound indices.getuses 0-based indexing and returns null for negative or out-of-bound indices.inputColumns, while cuDF list extraction expects a lists column input.Limitation
Compared with native C++ Velox, this cuDF implementation validates invalid indices by materializing and scanning masks across all rows before extraction. Native Velox can perform checks during per-row array access. This may introduce extra performance cost; fusing validation with extraction in a custom GPU kernel can be considered as future work.
Constant array literal handling is also a functional bridge rather than the ideal long-term representation. Materializing the constant array as a repeated lists column is less efficient than representing the constant complex value directly in cuDF. A better follow-up would be native support for constant complex/list values, similar in spirit to scalar constants such as
cudf::numeric_scalar, if cuDF adds or exposes an appropriate representation.