-
Notifications
You must be signed in to change notification settings - Fork 67
Document ODR violations in KOKKOS_IF_ON_HOST/DEVICE constexpr usage #813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
2b534ce
1c2ec9e
d5dd700
9ddc87e
c160bf9
e90a4fe
3dd7d9a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -92,8 +92,133 @@ accessible outside of it. | |||||
| ``constexpr`` Context | ||||||
| --------------------- | ||||||
|
|
||||||
| These macros cannot be used in a context that requires a ``constexpr`` | ||||||
| (constant expression). | ||||||
| These macros **must not** be used in a context that requires a ``constexpr`` | ||||||
| (constant expression). Using ``KOKKOS_IF_ON_HOST`` or ``KOKKOS_IF_ON_DEVICE`` | ||||||
| within ``constexpr`` functions or to initialize ``constexpr`` variables leads to | ||||||
| **One Definition Rule (ODR) violations** and undefined behavior. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| Why This Is Problematic | ||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||
|
|
||||||
| Unlike runtime function calls, ``constexpr`` functions and variables generate | ||||||
| compile-time values that can affect the structure of types, the size of objects, | ||||||
| and template instantiations. When ``KOKKOS_IF_ON_HOST`` and | ||||||
| ``KOKKOS_IF_ON_DEVICE`` are used in ``constexpr`` contexts, they cause the same | ||||||
| function or variable to have different compile-time values on the host versus | ||||||
| the device—similar to using architecture-specific preprocessor macros like | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I get what it's getting to, but I feel like just
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a comma or a parenthesis should be added. It looks like the |
||||||
| ``#ifdef __AVX2__`` in different translation units. | ||||||
|
|
||||||
| This is analogous to the following problematic pattern: | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // DO NOT DO THIS - ODR violation | ||||||
| static constexpr int foo() { | ||||||
| #ifdef __AVX2__ | ||||||
| return 4; | ||||||
| #else | ||||||
| return 2; | ||||||
| #endif | ||||||
| } | ||||||
|
|
||||||
| If you compile this code in two different translation units with different | ||||||
| compiler flags (one with AVX2 and one without) and then link them together, you | ||||||
| have an ODR violation because the same function has different definitions. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need this example? I think the next example makes it clear enough?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kinda do like to connect this to something not involving host/device compilation.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but the explanation is about different compilation units with different flags...which if you know how some compilers do device code is helpful, but if you are oblivious to that, the question is: why shouldn't I do that?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also find the avx example very confusing.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per review let's not get into that compile flag mess here. |
||||||
|
|
||||||
| The same principle applies to host/device compilation: a ``constexpr`` function | ||||||
| that returns different values on host and device violates the ODR. | ||||||
|
|
||||||
| Examples of ODR Violations | ||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||
|
|
||||||
| **Problematic: Using in** ``constexpr`` **function** | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // DO NOT DO THIS - causes ODR violation | ||||||
| static constexpr KOKKOS_FUNCTION int compute_block_size() { | ||||||
| KOKKOS_IF_ON_HOST((return 4;)) | ||||||
| KOKKOS_IF_ON_DEVICE((return 2;)) | ||||||
| } | ||||||
|
|
||||||
| struct Functor { | ||||||
| int data[compute_block_size()]; // Size differs on host vs device! | ||||||
| // This creates an ODR violation and undefined behavior | ||||||
| }; | ||||||
|
|
||||||
| In this example, the ``Functor`` struct would have different sizes on the host | ||||||
| and device, causing serious memory corruption issues when passed between them. | ||||||
|
|
||||||
| **Problematic: Lambda capture dependency** | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // DO NOT DO THIS - causes ODR violation | ||||||
| void foo() { | ||||||
| int a = 0; | ||||||
| double b = 1.0; | ||||||
| auto lambda = KOKKOS_LAMBDA(int) { | ||||||
| KOKKOS_IF_ON_HOST((printf("%i\n", a);)) // Captures 'a' | ||||||
| KOKKOS_IF_ON_DEVICE((printf("%lf\n", b);)) // Captures 'b' | ||||||
| }; | ||||||
| // Lambda has different size on host vs device due to different captures | ||||||
| } | ||||||
|
|
||||||
| The lambda object has different sizes on host and device because of the | ||||||
| different captures, violating the ODR. | ||||||
|
Comment on lines
+131
to
+147
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this problematic example never gets resolved, unlike the one above? If so, I think this example should be first, then the one that gets resolved should be second and it should be clear that the "Correct Alternatives" section is a subsection of that example. |
||||||
|
|
||||||
| Correct Alternatives | ||||||
| ^^^^^^^^^^^^^^^^^^^^ | ||||||
|
|
||||||
| **Alternative 1: Use non-**``constexpr`` **runtime function** | ||||||
|
|
||||||
| If the value doesn't need to be a compile-time constant, simply remove | ||||||
| ``constexpr``: | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // This is OK - runtime function | ||||||
| static KOKKOS_FUNCTION int compute_block_size() { | ||||||
| KOKKOS_IF_ON_HOST((return 4;)) | ||||||
| KOKKOS_IF_ON_DEVICE((return 2;)) | ||||||
| } | ||||||
|
|
||||||
| **Alternative 2: Move** ``KOKKOS_IF_ON_*`` **to calling context** | ||||||
|
|
||||||
| If you need compile-time constants, move the conditional compilation up one | ||||||
| level: | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // This is OK - separate constexpr values in each branch | ||||||
| KOKKOS_INLINE_FUNCTION void process() { | ||||||
| KOKKOS_IF_ON_HOST(( | ||||||
| constexpr int block_size = 4; | ||||||
| // Use block_size here... | ||||||
| )) | ||||||
| KOKKOS_IF_ON_DEVICE(( | ||||||
| constexpr int block_size = 2; | ||||||
| // Use block_size here... | ||||||
| )) | ||||||
| } | ||||||
|
|
||||||
| **Alternative 3 (Preferred): Use template specialization** | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the |
||||||
|
|
||||||
| If possible use template specialization on execution | ||||||
| spaces: | ||||||
|
|
||||||
| .. code-block:: cpp | ||||||
|
|
||||||
| // This is OK - different specializations | ||||||
| template<typename ExecutionSpace> | ||||||
| struct BlockSize { | ||||||
| static constexpr int value = 2; // Default for devices | ||||||
| }; | ||||||
|
|
||||||
| template<> | ||||||
| struct BlockSize<Kokkos::DefaultHostExecutionSpace> { | ||||||
| static constexpr int value = 4; // Specialized for host | ||||||
| }; | ||||||
|
|
||||||
| Best Practices | ||||||
| -------------- | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.