Skip to content
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 127 additions & 2 deletions docs/source/API/core/macros-special/if_on_host_or_device.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,133 @@ accessible outside of it.
``constexpr`` Context
---------------------

These macros cannot be used in a context that requires a ``constexpr``
(constant expression).
These macros **must not** be used in a context that requires a ``constexpr``
(constant expression). Using ``KOKKOS_IF_ON_HOST`` or ``KOKKOS_IF_ON_DEVICE``
within ``constexpr`` functions or to initialize ``constexpr`` variables leads to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
within ``constexpr`` functions or to initialize ``constexpr`` variables leads to
within ``constexpr`` functions or to initialize ``constexpr`` variables can lead to

**One Definition Rule (ODR) violations** and undefined behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**One Definition Rule (ODR) violations** and undefined behavior.
**One Definition Rule (ODR) violations** and cause undefined behavior.


Why This Is Problematic
^^^^^^^^^^^^^^^^^^^^^^^^

Unlike runtime function calls, ``constexpr`` functions and variables generate
compile-time values that can affect the structure of types, the size of objects,
and template instantiations. When ``KOKKOS_IF_ON_HOST`` and
``KOKKOS_IF_ON_DEVICE`` are used in ``constexpr`` contexts, they cause the same
function or variable to have different compile-time values on the host versus
the device—similar to using architecture-specific preprocessor macros like
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the device—similar to using architecture-specific preprocessor macros like
the device to using architecture-specific preprocessor macros like

I get what it's getting to, but I feel like just the device is sufficient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comma or a parenthesis should be added. It looks like the device-similar to ... was a kind of device (similar to ....

``#ifdef __AVX2__`` in different translation units.

This is analogous to the following problematic pattern:

.. code-block:: cpp

// DO NOT DO THIS - ODR violation
static constexpr int foo() {
#ifdef __AVX2__
return 4;
#else
return 2;
#endif
}

If you compile this code in two different translation units with different
compiler flags (one with AVX2 and one without) and then link them together, you
have an ODR violation because the same function has different definitions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this example? I think the next example makes it clear enough?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda do like to connect this to something not involving host/device compilation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the explanation is about different compilation units with different flags...which if you know how some compilers do device code is helpful, but if you are oblivious to that, the question is: why shouldn't I do that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the avx example very confusing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per review let's not get into that compile flag mess here.


The same principle applies to host/device compilation: a ``constexpr`` function
that returns different values on host and device violates the ODR.

Examples of ODR Violations
^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Problematic: Using in** ``constexpr`` **function**

.. code-block:: cpp

// DO NOT DO THIS - causes ODR violation
static constexpr KOKKOS_FUNCTION int compute_block_size() {
KOKKOS_IF_ON_HOST((return 4;))
KOKKOS_IF_ON_DEVICE((return 2;))
}

struct Functor {
int data[compute_block_size()]; // Size differs on host vs device!
// This creates an ODR violation and undefined behavior
};

In this example, the ``Functor`` struct would have different sizes on the host
and device, causing serious memory corruption issues when passed between them.

**Problematic: Lambda capture dependency**

.. code-block:: cpp

// DO NOT DO THIS - causes ODR violation
void foo() {
int a = 0;
double b = 1.0;
auto lambda = KOKKOS_LAMBDA(int) {
KOKKOS_IF_ON_HOST((printf("%i\n", a);)) // Captures 'a'
KOKKOS_IF_ON_DEVICE((printf("%lf\n", b);)) // Captures 'b'
};
// Lambda has different size on host vs device due to different captures
}

The lambda object has different sizes on host and device because of the
different captures, violating the ODR.
Comment on lines +131 to +147
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this problematic example never gets resolved, unlike the one above? If so, I think this example should be first, then the one that gets resolved should be second and it should be clear that the "Correct Alternatives" section is a subsection of that example.


Correct Alternatives
^^^^^^^^^^^^^^^^^^^^

**Alternative 1: Use non-**``constexpr`` **runtime function**

If the value doesn't need to be a compile-time constant, simply remove
``constexpr``:

.. code-block:: cpp

// This is OK - runtime function
static KOKKOS_FUNCTION int compute_block_size() {
KOKKOS_IF_ON_HOST((return 4;))
KOKKOS_IF_ON_DEVICE((return 2;))
}

**Alternative 2: Move** ``KOKKOS_IF_ON_*`` **to calling context**

If you need compile-time constants, move the conditional compilation up one
level:

.. code-block:: cpp

// This is OK - separate constexpr values in each branch
KOKKOS_INLINE_FUNCTION void process() {
KOKKOS_IF_ON_HOST((
constexpr int block_size = 4;
// Use block_size here...
))
KOKKOS_IF_ON_DEVICE((
constexpr int block_size = 2;
// Use block_size here...
))
}

**Alternative 3 (Preferred): Use template specialization**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Preferred method should be placed first as the Alternative 1.


If possible use template specialization on execution
spaces:

.. code-block:: cpp

// This is OK - different specializations
template<typename ExecutionSpace>
struct BlockSize {
static constexpr int value = 2; // Default for devices
};

template<>
struct BlockSize<Kokkos::DefaultHostExecutionSpace> {
static constexpr int value = 4; // Specialized for host
};

Best Practices
--------------
Expand Down