diff --git a/docs/Makefile b/docs/Makefile index 004bdd74a..95e8547fb 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -4,3 +4,6 @@ html: clean: rm -rf ./generated_docs/ + +linkcheck: + sphinx-build -b linkcheck ./source/ ./generated_docs/ -W --keep-going diff --git a/docs/source/API/algorithms/std-algorithms/all/StdCopyIf.rst b/docs/source/API/algorithms/std-algorithms/all/StdCopyIf.rst index fb78fc9d6..78ffef4ac 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdCopyIf.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdCopyIf.rst @@ -94,10 +94,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |copy| replace:: ``copy`` -.. _copy: ./StdCopy.html - -- ``exespace``, ``teamHandle``, ``first_from``, ``last_from``, ``first_to``, ``view_from``, ``view_to``: same as in |copy|_ +- ``exespace``, ``teamHandle``, ``first_from``, ``last_from``, ``first_to``, ``view_from``, ``view_to``: same as in :doc:`copy <./StdCopy>` - ``label``: diff --git a/docs/source/API/algorithms/std-algorithms/all/StdCopy_n.rst b/docs/source/API/algorithms/std-algorithms/all/StdCopy_n.rst index 7c8106b28..be4fd802f 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdCopy_n.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdCopy_n.rst @@ -84,11 +84,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |copy| replace:: ``copy`` -.. _copy: ./StdCopy.html - - -- ``exespace``, ``teamHandle``, ``first_from``, ``first_to``, ``view_from``, ``view_to``: same as in |copy|_ +- ``exespace``, ``teamHandle``, ``first_from``, ``first_to``, ``view_from``, ``view_to``: same as in :doc:`copy `. - ``label``: used to name the implementation kernels for debugging purposes diff --git a/docs/source/API/algorithms/std-algorithms/all/StdIsSortedUntil.rst b/docs/source/API/algorithms/std-algorithms/all/StdIsSortedUntil.rst index 1c54a13e4..9c72f7307 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdIsSortedUntil.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdIsSortedUntil.rst @@ -95,10 +95,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |IsSorted| replace:: ``is_sorted`` -.. _IsSorted: ./StdIsSorted.html - -- ``exespace``, ``teamHandle``, ``first``, ``last``, ``view``, ``comp``: same as in |IsSorted|_ +- ``exespace``, ``teamHandle``, ``first``, ``last``, ``view``, ``comp``: same as in :doc:`is_sorted <./StdIsSorted>` - ``label``: string forwarded to internal parallel kernels for debugging purposes diff --git a/docs/source/API/algorithms/std-algorithms/all/StdMaxElement.rst b/docs/source/API/algorithms/std-algorithms/all/StdMaxElement.rst index ba80ce42f..64b283fbd 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdMaxElement.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdMaxElement.rst @@ -92,11 +92,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. _min_element_link: ./StdMinElement.html - -.. |min_element_link| replace:: ``min_element`` - -- ``exespace``, ``first``, ``last``, ``view``, ``comp``: same as in |min_element_link|_ +- ``exespace``, ``first``, ``last``, ``view``, ``comp``: same as in :doc:`min_element <./StdMinElement>` - ``teamHandle``: team handle instance given inside a parallel region when using a TeamPolicy diff --git a/docs/source/API/algorithms/std-algorithms/all/StdMinMaxElement.rst b/docs/source/API/algorithms/std-algorithms/all/StdMinMaxElement.rst index 898a37a07..a8a8e2c01 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdMinMaxElement.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdMinMaxElement.rst @@ -92,11 +92,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. _min_element_link: ./StdMinElement.html - -.. |min_element_link| replace:: ``min_element`` - -- ``exespace``, ``first``, ``last``, ``view``, ``comp``: same as in |min_element_link|_ +- ``exespace``, ``first``, ``last``, ``view``, ``comp``: same as in :doc:`min_element <./StdMinElement>` - ``teamHandle``: team handle instance given inside a parallel region when using a TeamPolicy diff --git a/docs/source/API/algorithms/std-algorithms/all/StdRemoveCopyIf.rst b/docs/source/API/algorithms/std-algorithms/all/StdRemoveCopyIf.rst index 32dd0efa8..a056ba58a 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdRemoveCopyIf.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdRemoveCopyIf.rst @@ -93,10 +93,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |RemoveCopy| replace:: ``remove_copy`` -.. _RemoveCopy: ./StdRemoveCopy.html - -- ``exespace``, ``teamHandle``, ``first_from, last_from``, ``first_to``, ``view_from``, ``view_dest``: same as in |RemoveCopy|_ +- ``exespace``, ``teamHandle``, ``first_from, last_from``, ``first_to``, ``view_from``, ``view_dest``: same as in :doc:`remove_copy <./StdRemoveCopy>` - ``label``: string forwarded to internal parallel kernels for debugging purposes diff --git a/docs/source/API/algorithms/std-algorithms/all/StdRemoveIf.rst b/docs/source/API/algorithms/std-algorithms/all/StdRemoveIf.rst index 856fa2328..b7b1e326f 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdRemoveIf.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdRemoveIf.rst @@ -71,10 +71,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |remove| replace:: ``remove`` -.. _remove: ./StdRemove.html - -- ``exespace``, ``first``, ``last``, ``view``: same as in |remove|_ +- ``exespace``, ``first``, ``last``, ``view``: same as in :doc:`remove <./StdRemove>` - ``teamHandle``: team handle instance given inside a parallel region when using a TeamPolicy diff --git a/docs/source/API/algorithms/std-algorithms/all/StdShiftRight.rst b/docs/source/API/algorithms/std-algorithms/all/StdShiftRight.rst index 532efe3f4..c5c2c3616 100644 --- a/docs/source/API/algorithms/std-algorithms/all/StdShiftRight.rst +++ b/docs/source/API/algorithms/std-algorithms/all/StdShiftRight.rst @@ -63,10 +63,7 @@ Overload set accepting a team handle Parameters and Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. |ShiftLeft| replace:: ``shift_left`` -.. _ShiftLeft: ./StdShiftLeft.html - -- ``exespace`` ``teamHandle``, ``first``, ``last``, ``view``: same as in |ShiftLeft|_ +- ``exespace`` ``teamHandle``, ``first``, ``last``, ``view``: same as in :doc:`shift_left <./StdShiftLeft>` - ``label``: string forwarded to internal parallel kernels for debugging purposes diff --git a/docs/source/API/containers-index.rst b/docs/source/API/containers-index.rst index 4406ae8f0..7b57f0033 100644 --- a/docs/source/API/containers-index.rst +++ b/docs/source/API/containers-index.rst @@ -7,25 +7,25 @@ Containers API * - Container - Description - * - `Bitset `__ + * - :doc:`Bitset ` - A concurrent Bitset class. - * - `DualView `__ + * - :doc:`DualView ` - Container to manage access to mirrored views in different memory spaces. - * - `DynamicView `__ + * - :doc:`DynamicView ` - DynamicView comment. - * - `DynRankView `__ + * - :doc:`DynRankView ` - Kokkos Runtime-determined-dimension DynRankView class. - * - `ErrorReporter `__ + * - :doc:`ErrorReporter ` - A class to facilitate thread-safe error output. - * - `OffsetView `__ + * - :doc:`OffsetView ` - OffsetView comment. - * - `ScatterView `__ + * - :doc:`ScatterView ` - ScatterView comment. - * - `StaticCrsGraph `__ + * - :doc:`StaticCrsGraph ` - [DEPRECATED] StaticCrsGraph compressed raw storage graph - * - `UnorderedMap `__ + * - :doc:`UnorderedMap ` - UnorderedMap comment. - * - `vector `__ + * - :doc:`vector ` - [DEPRECATED] std::vector compatible implementation that works with non-host memory spaces. .. toctree:: diff --git a/docs/source/API/containers/DualView.rst b/docs/source/API/containers/DualView.rst index 7d7cdb61a..dcaa77ec1 100644 --- a/docs/source/API/containers/DualView.rst +++ b/docs/source/API/containers/DualView.rst @@ -14,7 +14,7 @@ Users are responsible for updating the modified flags manually if they change th Users may also synchronize data by calling the ``sync()`` method, which is templated on the device that requires synchronization (i.e., the target of the one-way copy operation). The DualView class also provides convenience methods such as realloc, resize and capacity -which call the appropriate methods of the underlying `Kokkos::View <../core/view/view.html>`_ objects. +which call the appropriate methods of the underlying :doc:`Kokkos::View <../core/view/view>` objects. The four template arguments are the same as those of ``Kokkos::View``. @@ -27,7 +27,7 @@ The four template arguments are the same as those of ``Kokkos::View``. and one in host memory. Otherwise, DualView will only store one View. * MemoryTraits (optional) The user's intended memory access behavior. Please see the documentation - of `Kokkos::View <../core/view/view.html>`_ for examples. The default suffices for most users. + of :doc:`Kokkos::View <../core/view/view>` for examples. The default suffices for most users. Usage ----- diff --git a/docs/source/API/containers/ScatterView.rst b/docs/source/API/containers/ScatterView.rst index 0dfb070bc..734e4bbb9 100644 --- a/docs/source/API/containers/ScatterView.rst +++ b/docs/source/API/containers/ScatterView.rst @@ -10,15 +10,6 @@ Header File: ```` ``ScatterView`` is still in the namespace ``Kokkos::Experimental`` - -.. _parallelReduce: ../core/parallel-dispatch/parallel_reduce.html - -.. |parallelReduce| replace:: :cpp:func:`parallel_reduce` - -.. _View: ../core/view/view.html - -.. |View| replace:: ``View`` - .. |reset| replace:: ``reset()`` .. |access| replace:: ``access()`` @@ -30,7 +21,7 @@ Header File: ```` Usage ----- -A Kokkos ScatterView serves as an interface for a standard Kokkos::|View|_ implementing a scatter-add pattern either via atomics or data replication. +A Kokkos ScatterView serves as an interface for a standard Kokkos:::cpp:class:`View` implementing a scatter-add pattern either via atomics or data replication. Construction of a ScatterView can be expensive, so you should try to reuse the same one if possible, in which case, you should call |reset|_ between uses. @@ -181,7 +172,7 @@ Description .. cpp:function:: contribute(View& dest, Kokkos::Experimental::ScatterView const& src) convenience function to perform final reduction of ScatterView - results into a resultant View; may be called following |parallelReduce|_. + results into a resultant View; may be called following :cpp:func:`Kokkos::parallel_reduce`. Example diff --git a/docs/source/API/core-index.rst b/docs/source/API/core-index.rst index 0d981201e..c3a446ea6 100644 --- a/docs/source/API/core-index.rst +++ b/docs/source/API/core-index.rst @@ -7,40 +7,40 @@ Core API * - Reducer - Description - * - `Initialization and Finalization `__ + * - :doc:`Initialization and Finalization ` - Initialization and finalization of Kokkos. - * - `View and related `__ + * - :doc:`View and related ` - Kokkos MultiDimensional View class and related free functions. - * - `Parallel Execution/Dispatch `__ + * - :doc:`Parallel Execution/Dispatch ` - Parallel Execution Dispatch. - * - `Built-in Reducers `__ + * - :doc:`Built-in Reducers ` - Built-in Reducers - * - `Execution Policies `__ + * - :doc:`Execution Policies ` - Execution policies. - * - `Spaces `__ + * - :doc:`Spaces ` - Description of Memory and Execution Spaces. - * - `Task-Parallelism `__ + * - :doc:`Task-Parallelism ` - Creating and dispatching Task Graphs. - * - `MultiGPU Support `__ + * - :doc:`MultiGPU Support ` - Launching kernels on multiple GPUs from one process. - * - `Atomics `__ + * - :doc:`Atomics ` - Atomics - * - `Numerics `__ + * - :doc:`Numerics ` - Common mathematical functions, mathematical constants, numeric traits, complex numbers, half-precision floating-point types. - * - `C-style memory management `__ + * - :doc:`C-style memory management ` - C-style memory management - * - `Traits `__ + * - :doc:`Traits ` - Traits - * - `Kokkos Concepts `__ + * - :doc:`Kokkos Concepts ` - Kokkos Concepts - * - `STL Compatibility Issues `__ + * - :doc:`STL Compatibility Issues ` - Ports of standard C++ capabilities, which otherwise do not work on various hardware platforms. - * - `Utilities `__ + * - :doc:`Utilities ` - Utility functionality part of Kokkos Core. - * - `Detection Idiom `__ + * - :doc:`Detection Idiom ` - Used to recognize, in an SFINAE-friendly way, the validity of any C++ expression. - * - `Macros `__ + * - :doc:`Macros ` - Global macros defined by Kokkos, used for architectures, general settings, etc. .. toctree:: diff --git a/docs/source/API/core/Execution-Policies.rst b/docs/source/API/core/Execution-Policies.rst index 37620170e..4803cc61d 100644 --- a/docs/source/API/core/Execution-Policies.rst +++ b/docs/source/API/core/Execution-Policies.rst @@ -4,7 +4,7 @@ Execution Policies Top Level Execution Policies ============================ -`ExecutionPolicyConcept `__ is the fundamental abstraction to represent “how” the execution of a Kokkos parallel pattern takes place. +:doc:`ExecutionPolicyConcept ` is the fundamental abstraction to represent “how” the execution of a Kokkos parallel pattern takes place. .. list-table:: :widths: 35 65 @@ -14,19 +14,19 @@ Top Level Execution Policies * - Policy - Description - * * `RangePolicy `__ + * * :doc:`RangePolicy ` * Each iterate is an integer in a contiguous range - * * `MDRangePolicy `_ + * * :doc:`MDRangePolicy ` * Each iterate for each rank is an integer in a contiguous range - * * `TeamPolicy `__ + * * :doc:`TeamPolicy ` * Assigns to each iterate in a contiguous range a team of threads Nested Execution Policies ============================ -Nested Execution Policies are used to dispatch parallel work inside of an already executing parallel region either dispatched with a `TeamPolicy `__ or a task policy. `NestedPolicies `__ summary. +Nested Execution Policies are used to dispatch parallel work inside of an already executing parallel region either dispatched with a :doc:`TeamPolicy ` or a task policy. :doc:`NestedPolicies ` summary. .. list-table:: :widths: 25 75 @@ -36,24 +36,26 @@ Nested Execution Policies are used to dispatch parallel work inside of an alread * - Policy - Description - * * `TeamThreadMDRange `__ + * * :doc:`TeamThreadMDRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops over a multidimensional range split over threads of a team. - * * `TeamThreadRange `__ + * * :doc:`TeamThreadRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops split over threads of a team. - * * `TeamVectorMDRange `__ + * * :doc:`TeamVectorMDRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops over a multidimensional range split over threads of a team and their vector lanes. - * * `TeamVectorRange `__ + * * :doc:`TeamVectorRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops split over threads of a team and their vector lanes. - * * `ThreadVectorMDRange `__ + * * :doc:`ThreadVectorMDRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops over a multidimensional range with vector lanes of a thread. - * * `ThreadVectorRange `__ + * * :doc:`ThreadVectorRange ` * Used inside of a TeamPolicy kernel to perform nested parallel loops with vector lanes of a thread. +.. _kokkos-common-arguments-for-all-execution-policies: + Common Arguments for all Execution Policies =========================================== @@ -82,7 +84,7 @@ Execution Policies generally accept compile time arguments via template paramete * * IndexType * e.g. ``IndexType`` - * Specify integer type to be used for traversing the iteration space. Defaults to the ``size_type`` of `ExecutionSpaceConcept `__. Can affect the performance depending on the backend. + * Specify integer type to be used for traversing the iteration space. Defaults to the ``size_type`` of :doc:`ExecutionSpaceConcept `. Can affect the performance depending on the backend. * * LaunchBounds * ``LaunchBounds`` diff --git a/docs/source/API/core/KokkosConcepts.rst b/docs/source/API/core/KokkosConcepts.rst index 511dea7ff..25aaab6db 100644 --- a/docs/source/API/core/KokkosConcepts.rst +++ b/docs/source/API/core/KokkosConcepts.rst @@ -34,35 +34,15 @@ we can maintain the flexibility we need while minimizing cognitive load on users Overview -------- -.. _ExecutionSpace: execution_spaces.html - -.. |ExecutionSpace| replace:: ``ExecutionSpace`` - -.. _MemorySpace: memory_spaces.html - -.. |MemorySpace| replace:: ``MemorySpace`` - -.. _ExecutionPolicy: Execution-Policies.html - -.. |ExecutionPolicy| replace:: ``ExecutionPolicy`` - -.. _RangePolicy: policies/RangePolicy.html - -.. |RangePolicy| replace:: ``RangePolicy`` - -.. _TeamMember: policies/TeamHandleConcept.html - -.. |TeamMember| replace:: ``TeamMember`` - When it comes to cognitive load, perhaps even more important than limiting the total number of concepts is limiting the number of *subsumption hierarchies* of concepts. Experience with C++ ranges has also shown that limiting the branching width of these hierarchies increases ease of learning. Roughly speaking and from a high-level perspective, the major user-visible concept hierarchies that Kokkos currently uses are: -* |ExecutionSpace|_ -* |MemorySpace|_ -* |ExecutionPolicy|_ (includes, for instance, |RangePolicy|_) -* |TeamMember|_ +* :doc:`ExecutionSpace ` +* :doc:`MemorySpace ` +* :doc:`ExecutionPolicy ` (includes, for instance, :doc:`RangePolicy `) +* :doc:`TeamMember ` * ``Functor`` Some minor hierarchies include: @@ -80,11 +60,7 @@ Some things currently being treated as concepts (according to ``Kokkos_Concepts. - The ``LaunchBounds<>`` tag - ``IterationPattern`` (a.k.a. ``Kokkos::Iterate``) -.. _Kokkos_View: view/view.html - -.. |Kokkos_View| replace:: ``Kokkos::View`` - -There is also some question as to whether |Kokkos_View|_ (and friends) should be presented +There is also some question as to whether :doc:`Kokkos::View ` (and friends) should be presented as a concept rather than just a class template, given the existence of act-alike class templates such as ``DualView`` and ``OffsetView`` external to Kokkos. @@ -96,7 +72,7 @@ The ``ExecutionSpace`` Concept .. |ExecutionSpaceTwo| replace:: ``ExecutionSpace`` Working off the functionality currently common to ``Serial``, ``Cuda``, ``OpenMP``, ``Threads``, ``HIP``, -and ``OpenMPTarget``, the current state of the Kokkos |ExecutionSpaceTwo|_ concept looks something like: +and ``OpenMPTarget``, the current state of the Kokkos :doc:`ExecutionSpace ` concept looks something like: .. code-block:: cpp @@ -133,10 +109,6 @@ currently implemented as static methods eventually need only be instance methods Implementation Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. _Kokkos_parallel_for: parallel-dispatch/parallel_for.html - -.. |Kokkos_parallel_for| replace:: ``Kokkos::parallel_for`` - Further requirements cannot be expressed without additional types constrained by additional concepts (this is a well-known limitation of the concepts mechanism in C++, and is necessary to preserve decidability of the type system). Though some argue for using an archetype pattern to get around this (whereby an archetype with an implementation-private @@ -145,7 +117,7 @@ the state of practice appears to be converging on a strategy that involves creat concept templated on all relevant types and constraining them together, which can then be used at relevant call site. Most argue that this is a necessary artifact of the language feature, but that constraining concepts together in this way does not count as an "extra" concept for the purposes of cognitive load assessment. -Applying this approach and assuming the intention is for things like |Kokkos_parallel_for|_ to remain +Applying this approach and assuming the intention is for things like :doc:`Kokkos::parallel_for ` to remain as algorithms rather than customization points, we get some further requirements from the ``Kokkos::Impl`` namespace: .. code-block:: cpp diff --git a/docs/source/API/core/Macros.rst b/docs/source/API/core/Macros.rst index 6612a7940..9112e90b5 100644 --- a/docs/source/API/core/Macros.rst +++ b/docs/source/API/core/Macros.rst @@ -122,45 +122,25 @@ Execution Spaces The following macros can be used to test whether or not a specified execution space is enabled. They can be tested for existence (e.g. ``#ifdef KOKKOS_ENABLE_SERIAL``). -.. _Serial: execution_spaces.html - -.. |Serial| replace:: :cpp:func:`Serial` - -.. _OpenMP: execution_spaces.html - -.. |OpenMP| replace:: :cpp:func:`OpenMP` - -.. _Threads: execution_spaces.html - -.. |Threads| replace:: :cpp:func:`Threads` - -.. _Cuda: execution_spaces.html - -.. |Cuda| replace:: :cpp:func:`Cuda` - -.. _HPX: execution_spaces.html - -.. |HPX| replace:: :cpp:func:`HPX` - -+--------------------------------+--------------------------------------------------------------------------+ -| Macro | Description | -+================================+==========================================================================+ -| ``KOKKOS_ENABLE_SERIAL`` | Defined if the |Serial|_ execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_OPENMP`` | Defined if the |OpenMP|_ execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_OPENMPTARGET`` | Defined if the experimental ``OpenMPTarget`` execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_THREADS`` | Defined if the |Threads|_ execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_CUDA`` | Defined if the |Cuda|_ execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_HIP`` | Defined if the experimental ``HIP`` execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_HPX`` | Defined if the |HPX|_ execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_SYCL`` | Defined if the experimental ``SYCL`` execution space is enabled. | -+--------------------------------+--------------------------------------------------------------------------+ ++--------------------------------+------------------------------------------------------------------------------+ +| Macro | Description | ++================================+==============================================================================+ +| ``KOKKOS_ENABLE_SERIAL`` | Defined if the :doc:`Serial ` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_OPENMP`` | Defined if the :doc:`OpenMP ` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_OPENMPTARGET`` | Defined if the experimental ``OpenMPTarget`` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_THREADS`` | Defined if the :doc:`Threads ` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_CUDA`` | Defined if the :doc:`Cuda ` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_HIP`` | Defined if the experimental ``HIP`` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_HPX`` | Defined if the :doc:`HPX ` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_SYCL`` | Defined if the experimental ``SYCL`` execution space is enabled. | ++--------------------------------+------------------------------------------------------------------------------+ Backend options --------------- @@ -206,18 +186,18 @@ Third-Party Library Settings These defines give information about what third-party libraries Kokkos was compiled with. -+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+ -| Macro | Description | -+===============================+=======================================================================================================================+ -| ``KOKKOS_ENABLE_HWLOC`` | Defined if `libhwloc `_ is enabled for NUMA and architecture information. | -+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_LIBDL`` | Defined if Kokkos links to the dynamic linker (libdl). | -+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_LIBQUADMATH`` | Defined if Kokkos links to the `GCC Quad-Precision Math Library API `_. | -+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+ -| ``KOKKOS_ENABLE_ONEDPL`` | Defined if Kokkos links to the `oneDPL library `_ when using the SYCL backend. | -| | Enabling this TPL might affect performance for Kokkos algorithms that use it, e.g., `sort`. | -+-------------------------------+-----------------------------------------------------------------------------------------------------------------------+ ++-------------------------------+--------------------------------------------------------------------------------------------------------------------------+ +| Macro | Description | ++===============================+==========================================================================================================================+ +| ``KOKKOS_ENABLE_HWLOC`` | Defined if `libhwloc `_ is enabled for NUMA and architecture information. | ++-------------------------------+--------------------------------------------------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_LIBDL`` | Defined if Kokkos links to the dynamic linker (libdl). | ++-------------------------------+--------------------------------------------------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_LIBQUADMATH`` | Defined if Kokkos links to the `GCC Quad-Precision Math Library API `_. | ++-------------------------------+--------------------------------------------------------------------------------------------------------------------------+ +| ``KOKKOS_ENABLE_ONEDPL`` | Defined if Kokkos links to the `oneDPL library `_ when using the SYCL backend. | +| | Enabling this TPL might affect performance for Kokkos algorithms that use it, e.g., `sort`. | ++-------------------------------+--------------------------------------------------------------------------------------------------------------------------+ Architectures ------------- diff --git a/docs/source/API/core/ParallelDispatch.rst b/docs/source/API/core/ParallelDispatch.rst index f0cdf6df2..ba6c5c016 100644 --- a/docs/source/API/core/ParallelDispatch.rst +++ b/docs/source/API/core/ParallelDispatch.rst @@ -12,13 +12,13 @@ Parallel execution patterns for composing algorithms. * - Function - Description - * - `parallel_for `__ + * - :doc:`parallel_for ` - Executes user code in parallel - * - `parallel_reduce `__ + * - :doc:`parallel_reduce ` - Executes user code to perform a reduction in parallel - * - `parallel_scan `__ + * - :doc:`parallel_scan ` - Executes user code to generate a prefix sum in parallel - * - `fence `__ + * - :doc:`fence ` - Fences execution spaces Tags for Team Policy Calculations @@ -32,11 +32,11 @@ The following parallel pattern tags are used to call the correct overload for te * - Tag - Pattern - * - `ParallelForTag `__ + * - :doc:`ParallelForTag ` - parallel_for - * - `ParallelReduceTag `__ + * - :doc:`ParallelReduceTag ` - parallel_reduce - * - `ParallelScanTag `__ + * - :doc:`ParallelScanTag ` - parallel_scan .. toctree:: diff --git a/docs/source/API/core/Profiling.rst b/docs/source/API/core/Profiling.rst index 00e176dc3..b78fb368f 100644 --- a/docs/source/API/core/Profiling.rst +++ b/docs/source/API/core/Profiling.rst @@ -3,11 +3,11 @@ Profiling Kokkos::Profiling::ScopedRegion ------------------------------- -See `Profiling::ScopedRegion `_ for details.` +See :doc:`Profiling::ScopedRegion ` for details.` Kokkos::Profiling::ProfilingSection ----------------------------------- -See `Profiling::ProfilingSection `_ for details.` +See :doc:`Profiling::ProfilingSection ` for details.` .. toctree:: :hidden: diff --git a/docs/source/API/core/SpaceAccessibility.rst b/docs/source/API/core/SpaceAccessibility.rst index 034c40b09..5a69a3e96 100644 --- a/docs/source/API/core/SpaceAccessibility.rst +++ b/docs/source/API/core/SpaceAccessibility.rst @@ -4,15 +4,7 @@ Space Accessibility .. role::cpp(code) :language: cpp -.. _ExecutionSpace: execution_spaces.html#executionspaceconcept - -.. |ExecutionSpace| replace:: ``ExecutionSpace`` - -.. _MemorySpace: memory_spaces.html#memoryspaceconcept - -.. |MemorySpace| replace:: ``MemorySpace`` - -``Kokkos::SpaceAccessibility<>`` is a traits class template that takes an |ExecutionSpace|_ type or |MemorySpace|_ type as the first template argument and a |MemorySpace|_ type as the second type and expresses details about the relationship between those entities. Given memory space types ``MSp1`` and ``MSp2`` and an execution space type ``Ex``, the following expressions will be valid with the specified meaning: +``Kokkos::SpaceAccessibility<>`` is a traits class template that takes an :doc:`ExecutionSpace ` type or :doc:`MemorySpace ` type as the first template argument and a :doc:`MemorySpace ` type as the second type and expresses details about the relationship between those entities. Given memory space types ``MSp1`` and ``MSp2`` and an execution space type ``Ex``, the following expressions will be valid with the specified meaning: ------------ @@ -36,11 +28,9 @@ Equivalent to ``Kokkos::SpaceAccessibility::accessi Kokkos::SpaceAccessibility::assignable -.. _KokkosView: view/view.html +.. |KokkosView| replace:: :cpp:class:`View` -.. |KokkosView| replace:: ``Kokkos::View`` - -A compile-time value convertible to ``bool`` guaranteed to be ``true`` if and only if it is valid within the Kokkos programming model to assign values from any (otherwise valid) instance of |KokkosView|_ type ``V2`` (with ``std::is_same::value`` equal to ``true``) to references retrieved from any (otherwise valid) instance of a ``View`` type ``V1`` (with ``std::is_same::value`` equal to ``true``). +A compile-time value convertible to ``bool`` guaranteed to be ``true`` if and only if it is valid within the Kokkos programming model to assign values from any (otherwise valid) instance of |KokkosView| type ``V2`` (with ``std::is_same::value`` equal to ``true``) to references retrieved from any (otherwise valid) instance of a |KokkosView| type ``V1`` (with ``std::is_same::value`` equal to ``true``). ------------ @@ -56,11 +46,7 @@ Equivalent to ``Kokkos::SpaceAccessibility::assignable`` Kokkos::SpaceAccessibility::deepcopy -.. _KokkosDeepCopy: view/deep_copy.html - -.. |KokkosDeepCopy| replace:: ``Kokkos::deep_copy`` - -A compile-time value convertible to ``bool`` guaranteed to be ``true`` if and only if it is valid within the Kokkos programming model to |KokkosDeepCopy|_ from any (otherwise valid) instance of |KokkosView|_ type ``V2`` (with ``std::is_same::value`` equal to ``true``) to any (otherwise valid and otherwise compatible) instance of a ``View`` type ``V1`` (with ``std::is_same::value`` equal to ``true``). In other words, if ``v2`` is a valid instance of ``V2`` and ``v1`` is a valid instance of ``V1`` (with shape and other attributes otherwise compatible with ``v2``), the following expression will be well-defined and valid in the Kokkos programming model: +A compile-time value convertible to ``bool`` guaranteed to be ``true`` if and only if it is valid within the Kokkos programming model to :cpp:func:`Kokkos::deep_copy` from any (otherwise valid) instance of |KokkosView| type ``V2`` (with ``std::is_same::value`` equal to ``true``) to any (otherwise valid and otherwise compatible) instance of a ``View`` type ``V1`` (with ``std::is_same::value`` equal to ``true``). In other words, if ``v2`` is a valid instance of ``V2`` and ``v1`` is a valid instance of ``V1`` (with shape and other attributes otherwise compatible with ``v2``), the following expression will be well-defined and valid in the Kokkos programming model: .. code-block:: cpp diff --git a/docs/source/API/core/Task-Parallelism.rst b/docs/source/API/core/Task-Parallelism.rst index fdb0d93b4..0603e1d29 100644 --- a/docs/source/API/core/Task-Parallelism.rst +++ b/docs/source/API/core/Task-Parallelism.rst @@ -12,12 +12,12 @@ Kokkos has support for lightweight task-based programming, which is currently pr Will Kokkos Tasking work for my problem? ---------------------------------------- -Not all task-based problems are a good fit for the current Kokkos approach to tasking. Currently, the tasking interface in Kokkos is targeted at problems with kernels far too small to overcome the inherent overhead of top-level Kokkos data parallel launches—that is, small but plentiful data parallel tasks with a non-trivial dependency structure. For tasks that fit this general scale model but have (very) trivial dependency structures, it may be easier to use `hierarchical parallelism <../../ProgrammingGuide/HierarchicalParallelism.html>`_, potentially with a ``Kokkos::Schedule`` scheduling policy (see, for instance, `this page `_) for load balancing if necessary. +Not all task-based problems are a good fit for the current Kokkos approach to tasking. Currently, the tasking interface in Kokkos is targeted at problems with kernels far too small to overcome the inherent overhead of top-level Kokkos data parallel launches—that is, small but plentiful data parallel tasks with a non-trivial dependency structure. For tasks that fit this general scale model but have (very) trivial dependency structures, it may be easier to use :doc:`hierarchical parallelism <../../ProgrammingGuide/HierarchicalParallelism>`, potentially with a ``Kokkos::Schedule`` scheduling policy (see, for instance, :doc:`this page `) for load balancing if necessary. Basic Usage ----------- -Fundamentally, task parallelism is just another form of parallelism in Kokkos. The same general idiom of pattern, policy, and functor applies as for ordinary `parallel dispatch <../../ProgrammingGuide/ParallelDispatch.html>`_: +Fundamentally, task parallelism is just another form of parallelism in Kokkos. The same general idiom of pattern, policy, and functor applies as for ordinary :doc:`parallel dispatch <../../ProgrammingGuide/ParallelDispatch>`: .. image:: ../../ProgrammingGuide/figures/parallel-dispatch.png @@ -39,7 +39,7 @@ In most ways, the functor portion of the task parallelism idiom in Kokkos is sim void operator()(TeamMember& member, double& result); }; -Similar to `team parallelism <../../ProgrammingGuide/HierarchicalParallelism.html>`_, the first parameter is the team handle, which has all of the same functionality as the one produced by a ``Kokkos::TeamPolicy``, with a few extras. Like with ``Kokkos::parallel_reduce()``, the output is expressed through the second parameter. Note that there is currently no lambda interface to Kokkos Tasking. +Similar to :doc:`team parallelism <../../ProgrammingGuide/HierarchicalParallelism>`, the first parameter is the team handle, which has all of the same functionality as the one produced by a ``Kokkos::TeamPolicy``, with a few extras. Like with ``Kokkos::parallel_reduce()``, the output is expressed through the second parameter. Note that there is currently no lambda interface to Kokkos Tasking. Task Patterns ------------- diff --git a/docs/source/API/core/Traits.rst b/docs/source/API/core/Traits.rst index 756eff95b..c6fd6e63a 100644 --- a/docs/source/API/core/Traits.rst +++ b/docs/source/API/core/Traits.rst @@ -9,12 +9,12 @@ Boolean type trait to detect types that model the Layout concept. is_execution_policy ------------------- -Boolean type trait to detect types that model `ExecutionPolicy concept `_. +Boolean type trait to detect types that model :doc:`ExecutionPolicy concept `. is_memory_space --------------- -Boolean type trait to detect types that model `MemorySpace concept `_. +Boolean type trait to detect types that model :ref:`MemorySpace concept `. is_memory_traits ---------------- @@ -24,7 +24,7 @@ Boolean type trait to detect specializations of ``Kokkos::MemoryTraits``. is_reducer ---------- -Boolean type trait to detect types that model the `Reducer concept `_. +Boolean type trait to detect types that model the :doc:`Reducer concept `. is_space -------- diff --git a/docs/source/API/core/View.rst b/docs/source/API/core/View.rst index 41a5ceb85..8da737ff7 100644 --- a/docs/source/API/core/View.rst +++ b/docs/source/API/core/View.rst @@ -10,29 +10,29 @@ The following facilities are available: * - Class - Description - * - `create_mirror[_view] `__ + * - :doc:`create_mirror[_view] ` - Creating a copy of a ``Kokkos::View`` in a different memory space. - * - `deep_copy() `__ + * - :doc:`deep_copy() ` - Copying data between views and scalars. - * - `LayoutLeft `__ + * - :doc:`LayoutLeft ` - Memory Layout matching Fortran. - * - `LayoutRight `__ + * - :doc:`LayoutRight ` - Memory Layout matching C. - * - `LayoutStride `__ + * - :doc:`LayoutStride ` - Memory Layout for arbitrary strides. - * - `MemoryTraits `__ + * - :doc:`MemoryTraits ` - Memory access traits. - * - `realloc `__ + * - :doc:`realloc ` - Reallocating a ``Kokkos::View``. - * - `resize `__ + * - :doc:`resize ` - Resizing a ``Kokkos::View``. - * - `subview `__ + * - :doc:`subview ` - Getting slices from a ``Kokkos::View``. - * - `View `__ + * - :doc:`View ` - The main Kokkos data structure, a multidimensional memory space and layout aware array. - * - `view_alloc() `__ + * - :doc:`view_alloc() ` - Create View allocation parameter bundle from argument list. - * - `View-like Types `__ + * - :doc:`View-like Types ` - Loosely defined as the set of class templates that behave like ``Kokkos::View``. .. toctree:: diff --git a/docs/source/API/core/atomics/atomic_compare_exchange.rst b/docs/source/API/core/atomics/atomic_compare_exchange.rst index 2d9499f85..7fbd9a1d6 100644 --- a/docs/source/API/core/atomics/atomic_compare_exchange.rst +++ b/docs/source/API/core/atomics/atomic_compare_exchange.rst @@ -34,4 +34,4 @@ Description See also -------- -* `atomic_exchange `_: atomically replaces the value of the referenced object and obtains the value held previously +* :doc:`atomic_exchange `: atomically replaces the value of the referenced object and obtains the value held previously diff --git a/docs/source/API/core/atomics/atomic_compare_exchange_strong.rst b/docs/source/API/core/atomics/atomic_compare_exchange_strong.rst index f5c6c3b2c..d37d39022 100644 --- a/docs/source/API/core/atomics/atomic_compare_exchange_strong.rst +++ b/docs/source/API/core/atomics/atomic_compare_exchange_strong.rst @@ -3,7 +3,7 @@ .. warning:: Deprecated since Kokkos 4.5, - use `atomic_compare_exchange `_ instead. + use :doc:`atomic_compare_exchange ` instead. .. role:: cpp(code) :language: cpp diff --git a/docs/source/API/core/atomics/atomic_exchange.rst b/docs/source/API/core/atomics/atomic_exchange.rst index 868854893..ae4476224 100644 --- a/docs/source/API/core/atomics/atomic_exchange.rst +++ b/docs/source/API/core/atomics/atomic_exchange.rst @@ -31,6 +31,6 @@ Description See also -------- -* `atomic_load `_: atomically obtains the value of the referenced object -* `atomic_store `_: atomically replaces the value of the referenced object with a non-atomic argument -* `atomic_compare_exchange `_: atomically compares the value of the referenced object with non-atomic argument and performs atomic exchange if equal or atomic load if not +* :doc:`atomic_load `: atomically obtains the value of the referenced object +* :doc:`atomic_store `: atomically replaces the value of the referenced object with a non-atomic argument +* :doc:`atomic_compare_exchange `: atomically compares the value of the referenced object with non-atomic argument and performs atomic exchange if equal or atomic load if not diff --git a/docs/source/API/core/atomics/atomic_load.rst b/docs/source/API/core/atomics/atomic_load.rst index 9712d80db..0f5131403 100644 --- a/docs/source/API/core/atomics/atomic_load.rst +++ b/docs/source/API/core/atomics/atomic_load.rst @@ -29,5 +29,5 @@ Description See also -------- -* `atomic_store `_: atomically replaces the value of the referenced object with a non-atomic argument -* `atomic_exchange `_: atomically replaces the value of the referenced object and obtains the value held previously +* :doc:`atomic_store `: atomically replaces the value of the referenced object with a non-atomic argument +* :doc:`atomic_exchange `: atomically replaces the value of the referenced object and obtains the value held previously diff --git a/docs/source/API/core/atomics/atomic_store.rst b/docs/source/API/core/atomics/atomic_store.rst index 16eec8630..ebc77da30 100644 --- a/docs/source/API/core/atomics/atomic_store.rst +++ b/docs/source/API/core/atomics/atomic_store.rst @@ -31,5 +31,5 @@ Description See also -------- -* `atomic_load `_: atomically obtains the value of the referenced object -* `atomic_exchange `_: atomically replaces the value of the referenced object and obtains the value held previously +* :doc:`atomic_load `: atomically obtains the value of the referenced object +* :doc:`atomic_exchange `: atomically replaces the value of the referenced object and obtains the value held previously diff --git a/docs/source/API/core/builtin_reducers.rst b/docs/source/API/core/builtin_reducers.rst index 602500882..9f029a192 100644 --- a/docs/source/API/core/builtin_reducers.rst +++ b/docs/source/API/core/builtin_reducers.rst @@ -1,9 +1,9 @@ Built-in Reducers ================= -`ReducerConcept `__ provides the concept for Reducers. +:doc:`ReducerConcept ` provides the concept for Reducers. -Reducer objects used in conjunction with `parallel_reduce `__ +Reducer objects used in conjunction with :doc:`parallel_reduce ` .. list-table:: :widths: 25 75 @@ -11,29 +11,29 @@ Reducer objects used in conjunction with `parallel_reduce `__ + * - :doc:`BAnd ` - Binary 'And' reduction - * - `BOr `__ + * - :doc:`BOr ` - Binary 'Or' reduction - * - `LAnd `__ + * - :doc:`LAnd ` - Logical 'And' reduction - * - `LOr `__ + * - :doc:`LOr ` - Logical 'Or' reduction - * - `Max `__ + * - :doc:`Max ` - Maximum reduction - * - `MaxLoc `__ + * - :doc:`MaxLoc ` - Reduction providing maximum and an associated index - * - `Min `__ + * - :doc:`Min ` - Minimum reduction - * - `MinLoc `__ + * - :doc:`MinLoc ` - Reduction providing minimum and an associated index - * - `MinMax `__ + * - :doc:`MinMax ` - Reduction providing both minimum and maximum - * - `MinMaxLoc `__ + * - :doc:`MinMaxLoc ` - Reduction providing both minimum and maximum and associated indices - * - `Prod `__ + * - :doc:`Prod ` - Multiplicative reduction - * - `Sum `__ + * - :doc:`Sum ` - Sum reduction @@ -41,7 +41,7 @@ Reducer objects used in conjunction with `parallel_reduce `__ are template classes for storage for reducers. +:doc:`Reduction Scalar Types ` are template classes for storage for reducers. .. toctree:: :hidden: diff --git a/docs/source/API/core/builtinreducers/BAnd.rst b/docs/source/API/core/builtinreducers/BAnd.rst index 981b4ea96..4a7528cde 100644 --- a/docs/source/API/core/builtinreducers/BAnd.rst +++ b/docs/source/API/core/builtinreducers/BAnd.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing bitwise ``AND`` operation +Specific implementation of :doc:`ReducerConcept ` performing bitwise ``AND`` operation Header File: ```` @@ -104,4 +104,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator &`` defined. ``Kokkos::reduction_identity::band()`` is a valid expression. -* In order to use BAnd with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use BAnd with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/BOr.rst b/docs/source/API/core/builtinreducers/BOr.rst index 82ae4e2cd..b961227e9 100644 --- a/docs/source/API/core/builtinreducers/BOr.rst +++ b/docs/source/API/core/builtinreducers/BOr.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing bitwise ``OR`` operation +Specific implementation of :doc:`ReducerConcept ` performing bitwise ``OR`` operation Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator |`` defined. ``Kokkos::reduction_identity::bor()`` is a valid expression. -* In order to use BOr with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use BOr with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/LAnd.rst b/docs/source/API/core/builtinreducers/LAnd.rst index 9d80d3920..a525072f7 100644 --- a/docs/source/API/core/builtinreducers/LAnd.rst +++ b/docs/source/API/core/builtinreducers/LAnd.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing logical ``AND`` operation +Specific implementation of :doc:`ReducerConcept ` performing logical ``AND`` operation Header File: ```` @@ -104,4 +104,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator &&`` defined. ``Kokkos::reduction_identity::land()`` is a valid expression. -* In order to use LAnd with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use LAnd with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/LOr.rst b/docs/source/API/core/builtinreducers/LOr.rst index c5ca56255..e41920cd7 100644 --- a/docs/source/API/core/builtinreducers/LOr.rst +++ b/docs/source/API/core/builtinreducers/LOr.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing logical ``OR`` operation +Specific implementation of :doc:`ReducerConcept ` performing logical ``OR`` operation Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator ||`` defined. ``Kokkos::reduction_identity::lor()`` is a valid expression. -* In order to use LOr with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details. +* In order to use LOr with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details. diff --git a/docs/source/API/core/builtinreducers/Max.rst b/docs/source/API/core/builtinreducers/Max.rst index 349579fd1..a8daf7a14 100644 --- a/docs/source/API/core/builtinreducers/Max.rst +++ b/docs/source/API/core/builtinreducers/Max.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing the maximum value +Specific implementation of :doc:`ReducerConcept ` storing the maximum value Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator >`` defined. ``Kokkos::reduction_identity::max()`` is a valid expression. -* In order to use Max with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use Max with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/MaxLoc.rst b/docs/source/API/core/builtinreducers/MaxLoc.rst index b43d378d9..4698b5d20 100644 --- a/docs/source/API/core/builtinreducers/MaxLoc.rst +++ b/docs/source/API/core/builtinreducers/MaxLoc.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing the maximum value +Specific implementation of :doc:`ReducerConcept ` storing the maximum value Header File: ```` @@ -61,7 +61,7 @@ Interface .. cpp:type:: value_type - The reduction scalar type (specialization of `ValLocScalar `_) + The reduction scalar type (specialization of :doc:`ValLocScalar `) .. cpp:type:: result_view_type @@ -107,4 +107,4 @@ Additional Information * Requires: ``Index`` has ``operator =`` defined. ``Kokkos::reduction_identity::min()`` is a valid expression. -* In order to use MaxLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use MaxLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/Min.rst b/docs/source/API/core/builtinreducers/Min.rst index 9071ba382..16e70e257 100644 --- a/docs/source/API/core/builtinreducers/Min.rst +++ b/docs/source/API/core/builtinreducers/Min.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing the minimum value +Specific implementation of :doc:`ReducerConcept ` storing the minimum value Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator <`` defined. ``Kokkos::reduction_identity::min()`` is a valid expression. -* In order to use Min with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use Min with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/MinLoc.rst b/docs/source/API/core/builtinreducers/MinLoc.rst index 98d580832..a64849b2a 100644 --- a/docs/source/API/core/builtinreducers/MinLoc.rst +++ b/docs/source/API/core/builtinreducers/MinLoc.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing the minimum value with an index +Specific implementation of :doc:`ReducerConcept ` storing the minimum value with an index Header File: ```` @@ -61,7 +61,7 @@ Interface .. cpp:type:: value_type - The reduction scalar type (specialization of `ValLocScalar `_) + The reduction scalar type (specialization of :doc:`ValLocScalar `) .. cpp:type:: result_view_type @@ -107,7 +107,7 @@ Additional Information * Requires: ``Index`` has ``operator =`` defined. ``Kokkos::reduction_identity::min()`` is a valid expression. -* In order to use MinLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use MinLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details Example ------- diff --git a/docs/source/API/core/builtinreducers/MinMax.rst b/docs/source/API/core/builtinreducers/MinMax.rst index ae034fcf3..e78b9f4fe 100644 --- a/docs/source/API/core/builtinreducers/MinMax.rst +++ b/docs/source/API/core/builtinreducers/MinMax.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing both the minimum and maximum values +Specific implementation of :doc:`ReducerConcept ` storing both the minimum and maximum values Header File: ```` @@ -60,7 +60,7 @@ Interface .. cpp:type:: value_type - The reduction scalar type (specialization of `MinMaxScalar `_) + The reduction scalar type (specialization of :doc:`MinMaxScalar `) .. cpp:type:: result_view_type @@ -105,4 +105,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =``, ``operator <`` and ``operator >`` defined. ``Kokkos::reduction_identity::min()`` and ``Kokkos::reduction_identity::max()`` are a valid expressions. -* In order to use MinMax with a custom type of ``Scalar``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use MinMax with a custom type of ``Scalar``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/MinMaxLoc.rst b/docs/source/API/core/builtinreducers/MinMaxLoc.rst index f152bc9c2..e80d50558 100644 --- a/docs/source/API/core/builtinreducers/MinMaxLoc.rst +++ b/docs/source/API/core/builtinreducers/MinMaxLoc.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ storing both the minimum and maximum values with corresponding indices +Specific implementation of :doc:`ReducerConcept ` storing both the minimum and maximum values with corresponding indices Header File: ```` @@ -61,7 +61,7 @@ Interface .. cpp:type:: value_type - The reduction scalar type (specialization of `MinMaxLocScalar `_) + The reduction scalar type (specialization of :doc:`MinMaxLocScalar `) .. cpp:type:: result_view_type @@ -113,4 +113,4 @@ Additional Information * Requires: ``Index`` has ``operator =`` defined. ``Kokkos::reduction_identity::min()`` is a valid expressions. -* In order to use MinMaxLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details. +* In order to use MinMaxLoc with a custom type of either ``Scalar`` or ``Index``, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details. diff --git a/docs/source/API/core/builtinreducers/Prod.rst b/docs/source/API/core/builtinreducers/Prod.rst index 6c229f4aa..33694a68a 100644 --- a/docs/source/API/core/builtinreducers/Prod.rst +++ b/docs/source/API/core/builtinreducers/Prod.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing a ``multiply`` operation +Specific implementation of :doc:`ReducerConcept ` performing a ``multiply`` operation Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator *=`` defined. ``Kokkos::reduction_identity::prod()`` is a valid expression. -* In order to use Prod with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use Prod with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/builtinreducers/ReducerConcept.rst b/docs/source/API/core/builtinreducers/ReducerConcept.rst index 018625dd9..8c0a43647 100644 --- a/docs/source/API/core/builtinreducers/ReducerConcept.rst +++ b/docs/source/API/core/builtinreducers/ReducerConcept.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -The concept of a Reducer is the abstraction that defines the "how" a "Reduction" is performed during the parallel reduce execution pattern. The abstraction of "what" is given as a template parameter and corresponds to the "what" that is being reduced in the `parallel_reduce <../parallel-dispatch/parallel_reduce.html>`_ operation. This page describes the definitions and functions expected from a Reducer with a hypothetical 'Reducer' class definition. A brief description of built-in reducers is also included. +The concept of a Reducer is the abstraction that defines the "how" a "Reduction" is performed during the parallel reduce execution pattern. The abstraction of "what" is given as a template parameter and corresponds to the "what" that is being reduced in the :doc:`parallel_reduce <../parallel-dispatch/parallel_reduce>` operation. This page describes the definitions and functions expected from a Reducer with a hypothetical 'Reducer' class definition. A brief description of built-in reducers is also included. Header File: ```` @@ -119,20 +119,20 @@ is commutative and associative with identity element that can be set by calling Built-In Reducers ~~~~~~~~~~~~~~~~~ -Kokkos provides a number of built-in reducers that automatically work with the intrinsic C++ types as well as ``Kokkos::complex``. In order to use a Built-in reducer with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. A simple example is shown below and more information can be found under `Custom Reductions <../../../ProgrammingGuide/Custom-Reductions.html>`_. - -* `Kokkos::BAnd `_ -* `Kokkos::BOr `_ -* `Kokkos::LAnd `_ -* `Kokkos::LOr `_ -* `Kokkos::Max `_ -* `Kokkos::MaxLoc `_ -* `Kokkos::Min `_ -* `Kokkos::MinLoc `_ -* `Kokkos::MinMax `_ -* `Kokkos::MinMaxLoc `_ -* `Kokkos::Prod `_ -* `Kokkos::Sum `_ +Kokkos provides a number of built-in reducers that automatically work with the intrinsic C++ types as well as ``Kokkos::complex``. In order to use a Built-in reducer with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. A simple example is shown below and more information can be found under :doc:`Custom Reductions <../../../ProgrammingGuide/Custom-Reductions>`. + +* :doc:`Kokkos::BAnd ` +* :doc:`Kokkos::BOr ` +* :doc:`Kokkos::LAnd ` +* :doc:`Kokkos::LOr ` +* :doc:`Kokkos::Max ` +* :doc:`Kokkos::MaxLoc ` +* :doc:`Kokkos::Min ` +* :doc:`Kokkos::MinLoc ` +* :doc:`Kokkos::MinMax ` +* :doc:`Kokkos::MinMaxLoc ` +* :doc:`Kokkos::Prod ` +* :doc:`Kokkos::Sum ` Examples -------- diff --git a/docs/source/API/core/builtinreducers/Sum.rst b/docs/source/API/core/builtinreducers/Sum.rst index 9a7415041..9a396d511 100644 --- a/docs/source/API/core/builtinreducers/Sum.rst +++ b/docs/source/API/core/builtinreducers/Sum.rst @@ -4,7 +4,7 @@ .. role:: cpp(code) :language: cpp -Specific implementation of `ReducerConcept `_ performing an ``add`` operation +Specific implementation of :doc:`ReducerConcept ` performing an ``add`` operation Header File: ```` @@ -103,4 +103,4 @@ Additional Information * Requires: ``Scalar`` has ``operator =`` and ``operator +=`` defined. ``Kokkos::reduction_identity::sum()`` is a valid expression. -* In order to use Sum with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See `Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types.html>`_ for details +* In order to use Sum with a custom type, a template specialization of ``Kokkos::reduction_identity`` must be defined. See :doc:`Built-In Reducers with Custom Scalar Types <../../../ProgrammingGuide/Custom-Reductions-Built-In-Reducers-with-Custom-Scalar-Types>` for details diff --git a/docs/source/API/core/c_style_memory_management/free.rst b/docs/source/API/core/c_style_memory_management/free.rst index 337937a2a..42706690b 100644 --- a/docs/source/API/core/c_style_memory_management/free.rst +++ b/docs/source/API/core/c_style_memory_management/free.rst @@ -6,15 +6,7 @@ Defined in header ```` -.. _Kokkos_kokkos_malloc: ./malloc.html - -.. |Kokkos_kokkos_malloc| replace:: ``Kokkos::kokkos_malloc()`` - -.. _Kokkos_kokkos_realloc: ./realloc.html - -.. |Kokkos_kokkos_realloc| replace:: ``Kokkos::kokkos_realloc()`` - -Deallocates the space previously allocated by |Kokkos_kokkos_malloc|_ or |Kokkos_kokkos_realloc|_ on the specified memory space ``MemorySpace``. +Deallocates the space previously allocated by :cpp:func:`kokkos_malloc` or :cpp:func:`kokkos_realloc` on the specified memory space ``MemorySpace``. If ``ptr`` is a null pointer, the function does nothing. diff --git a/docs/source/API/core/c_style_memory_management/malloc.rst b/docs/source/API/core/c_style_memory_management/malloc.rst index 69bced00d..18c2a3cb2 100644 --- a/docs/source/API/core/c_style_memory_management/malloc.rst +++ b/docs/source/API/core/c_style_memory_management/malloc.rst @@ -6,19 +6,7 @@ Defined in header ```` -.. _MemorySpace: ../memory_spaces.html - -.. |MemorySpace| replace:: ``MemorySpace`` - -.. _Kokkos_kokkos_free: free.html - -.. |Kokkos_kokkos_free| replace:: ``Kokkos::kokkos_free()`` - -.. _Kokkos_realloc: realloc.html - -.. |Kokkos_realloc| replace:: ``Kokkos::kokkos_realloc()`` - -Allocate ``size`` bytes of uninitialized storage on the specified memory space |MemorySpace|_ plus some extra space for metadata such as the label. +Allocate ``size`` bytes of uninitialized storage on the specified memory space :doc:`MemorySpace <../memory_spaces>` plus some extra space for metadata such as the label. If allocation succeeds, returns a pointer to the lowest (first) byte in the allocated memory block that is suitably aligned for any scalar type. @@ -41,6 +29,6 @@ Description :param size: The number of bytes to allocate. - :returns: On success, returns the pointer to the beginning of newly allocated memory. To avoid a memory leak, the returned pointer must be deallocated with |Kokkos_kokkos_free|_ or |Kokkos_realloc|_. + :returns: On success, returns the pointer to the beginning of newly allocated memory. To avoid a memory leak, the returned pointer must be deallocated with :cpp:func:`kokkos_free` or :cpp:func:`kokkos_realloc`. :throws: On failure, throws ``Kokkos::Experimental::RawMemoryAllocationFailure``. diff --git a/docs/source/API/core/c_style_memory_management/realloc.rst b/docs/source/API/core/c_style_memory_management/realloc.rst index c79077bcc..979a6716e 100644 --- a/docs/source/API/core/c_style_memory_management/realloc.rst +++ b/docs/source/API/core/c_style_memory_management/realloc.rst @@ -6,24 +6,8 @@ Defined in header ```` -.. _Kokkos_kokkos_malloc: malloc.html - -.. |Kokkos_kokkos_malloc| replace:: ``Kokkos::kokkos_malloc()`` - -.. _Kokkos_kokkos_realloc: realloc.html - -.. |Kokkos_kokkos_realloc| replace:: ``Kokkos::kokkos_realloc()`` - -.. _MemorySpace: ../memory_spaces.html - -.. |MemorySpace| replace:: ``MemorySpace`` - -.. _Kokkos_kokkos_free: free.html - -.. |Kokkos_kokkos_free| replace:: ``Kokkos::kokkos_free()`` - -Reallocates the given area of memory. It must be previously allocated by |Kokkos_kokkos_malloc|_ or |Kokkos_kokkos_realloc|_ -on the same memory space |MemorySpace|_ and not yet freed with |Kokkos_kokkos_free|_, otherwise, the results are undefined. +Reallocates the given area of memory. It must be previously allocated by :cpp:func:`kokkos_malloc` or :cpp:func:`kokkos_realloc` +on the same memory space :doc:`MemorySpace <../memory_spaces>` and not yet freed with :cpp:func:`kokkos_free`, otherwise, the results are undefined. .. warning:: @@ -41,6 +25,6 @@ Description :param new_size: The new size in bytes. - :returns: On success, returns a pointer to the beginning of the newly allocated memory. To avoid a memory leak, the returned pointer must be deallocated with |Kokkos_kokkos_free|_, the original pointer ``ptr`` is invalidated and any access to it is undefined behavior (even if reallocation was in-place). On failure, returns a null pointer. The original pointer ptr remains valid and may need to be deallocated with |Kokkos_kokkos_free|_. + :returns: On success, returns a pointer to the beginning of the newly allocated memory. To avoid a memory leak, the returned pointer must be deallocated with :cpp:func:`kokkos_free`, the original pointer ``ptr`` is invalidated and any access to it is undefined behavior (even if reallocation was in-place). On failure, returns a null pointer. The original pointer ptr remains valid and may need to be deallocated with :cpp:func:`kokkos_free`. :throws: On failure, throws ``Kokkos::Experimental::RawMemoryAllocationFailure``. diff --git a/docs/source/API/core/execution_spaces.rst b/docs/source/API/core/execution_spaces.rst index 41834c00e..75cb84961 100644 --- a/docs/source/API/core/execution_spaces.rst +++ b/docs/source/API/core/execution_spaces.rst @@ -14,29 +14,17 @@ Execution Spaces .. |DocExecutionSpaceConcept| replace:: the documentation on the :cpp:func:`ExecutionSpace` concept -.. _Experimental: utilities/experimental.html#experimentalnamespace - -.. |Experimental| replace:: Experimental - -.. _KokkosConcepts: KokkosConcepts.html - -.. |KokkosConcepts| replace:: this document +.. |Experimental| replace:: :doc:`Experimental ` .. _ExecutionSpaceS: #kokkos-executionspaceconcept .. |ExecutionSpaceS| replace:: :cpp:func:`ExecutionSpace` s -.. _MemorySpace: memory_spaces.html#kokkos-memoryspaceconcept - -.. |MemorySpace| replace:: :cpp:func:`MemorySpace` +.. |MemorySpace| replace:: :ref:`MemorySpace ` -.. _KokkosSpaceAccessibility: SpaceAccessibility.html +.. |KokkosSpaceAccessibility| replace:: :doc:`Kokkos::SpaceAccessibility ` -.. |KokkosSpaceAccessibility| replace:: :cpp:func:`Kokkos::SpaceAccessibility` - -.. _KokkosTeamPolicy: policies/TeamPolicy.html - -.. |KokkosTeamPolicy| replace:: :cpp:func:`Kokkos::TeamPolicy` +.. |KokkosTeamPolicy| replace:: :cpp:class:`TeamPolicy` .. _ExecutionSpaceConcept: #kokkos-executionspaceconcept @@ -48,18 +36,17 @@ Execution Spaces ``Kokkos::Cuda`` is an |ExecutionSpaceConceptType|_ representing execution on a Cuda device. Except in rare instances, it should not be used directly, but instead should be used generically as an execution space. For details, see |DocExecutionSpaceConcept|_. - ``Kokkos::HIP`` --------------- -``Kokkos::HIP`` :sup:`promoted from` |Experimental|_ :sup:`since 4.0` is an |ExecutionSpaceConceptType|_ representing +``Kokkos::HIP`` :sup:`promoted from` |Experimental| :sup:`since 4.0` is an |ExecutionSpaceConceptType|_ representing execution on a device supported by HIP. Except in rare instances, it should not be used directly, but instead should be used generically as an execution space. For details, see |DocExecutionSpaceConcept|_. ``Kokkos::SYCL`` ------------------------------ -``Kokkos::SYCL`` :sup:`promoted from` |Experimental|_ :sup:`since 4.5` is an |ExecutionSpaceConceptType|_ representing execution on a device supported by SYCL. +``Kokkos::SYCL`` :sup:`promoted from` |Experimental| :sup:`since 4.5` is an |ExecutionSpaceConceptType|_ representing execution on a device supported by SYCL. If the SYCL backend is enabled and no GPU architecture is specified, Kokkos will use Just-In-Time compilation without any restriction to a particular SYCL device type. Thus, this is the only option to target a CPU with the SYCL backend (which is experimental, untested, and not optimized for). @@ -99,13 +86,15 @@ generically as an execution space. For details, see |DocExecutionSpaceConcept|_. Except in rare instances, it should not be used directly, but instead should be used generically as an execution space. For details, see |DocExecutionSpaceConcept|_. +.. _kokkos-executionspaceconcept: + ``Kokkos::ExecutionSpaceConcept`` --------------------------------- The concept of an ``ExecutionSpace`` is the fundamental abstraction to represent the "where" and the "how" that execution takes place in Kokkos. Most code that uses Kokkos should be written to the *generic concept* of an ``ExecutionSpace`` rather than any specific instance. This page talks practically about how to *use* -the common features of execution spaces in Kokkos; for a more formal and theoretical treatment, see |KokkosConcepts|_. +the common features of execution spaces in Kokkos; for a more formal and theoretical treatment, see :doc:`this document `. *Disclaimer*: There is nothing new about the term "concept" in C++; anyone who has ever used templates in C++ has used concepts whether they knew it or not. Please do not be confused by the word "concept" itself, @@ -234,11 +223,11 @@ where ``ostr`` is a ``std::ostream`` (like ``std::cout``, for instance) and ``de Additionally, the following type aliases (a.k.a. ``typedef`` s) will be defined by all execution space types: -* ``Ex::memory_space``: the default |MemorySpace|_ to use when executing with ``Ex``. Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|_) +* ``Ex::memory_space``: the default |MemorySpace| to use when executing with ``Ex``. Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|) * ``Ex::array_layout``: the default ``ArrayLayout`` recommended for use with ``View`` types accessed from ``Ex``. -* ``Ex::scratch_memory_space``: the ``ScratchMemorySpace`` that parallel patterns will use for allocation of scratch memory (for instance, as requested by a |KokkosTeamPolicy|_). Only unmanaged Views can be created using this memory space. +* ``Ex::scratch_memory_space``: the ``ScratchMemorySpace`` that parallel patterns will use for allocation of scratch memory (for instance, as requested by a |KokkosTeamPolicy|). Only unmanaged Views can be created using this memory space. Default Constructibility, Copy Constructibility ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -301,13 +290,13 @@ Typedefs * ``execution_space``: The self type; -* ``memory_space``: The default |MemorySpace|_ to use when executing with |ExecutionSpaceConcept|_. Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|_) +* ``memory_space``: The default |MemorySpace| to use when executing with |ExecutionSpaceConcept|_. Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|) * ``device_type``: ``DeviceType``. * ``array_layout``: The default ``ArrayLayout`` recommended for use with ``View`` types accessed from |ExecutionSpaceConcept|_. -* ``scratch_memory_space``: The ``ScratchMemorySpace`` that parallel patterns will use for allocation of scratch memory (for instance, as requested by a |KokkosTeamPolicy|_). Only unmanaged Views can be created using this memory space. +* ``scratch_memory_space``: The ``ScratchMemorySpace`` that parallel patterns will use for allocation of scratch memory (for instance, as requested by a |KokkosTeamPolicy|). Only unmanaged Views can be created using this memory space. * ``size_type``: The default integer type associated with this space. Signed or unsigned, 32 or 64 bit integer type, used as preferred type for indexing. @@ -334,6 +323,6 @@ Non Member Facilities * ``template struct is_execution_space;``: typetrait to check whether a class is a execution space. -* ``template struct SpaceAccessibility;``: typetraits to check whether two spaces are compatible (assignable, deep_copy-able, accessible). (see |KokkosSpaceAccessibility|_) +* ``template struct SpaceAccessibility;``: typetraits to check whether two spaces are compatible (assignable, deep_copy-able, accessible). (see |KokkosSpaceAccessibility|) * ``bool operator==(const execution_space& lhs, const execution_space& rhs)``: tests whether the two space instances (of the same type) are identical. diff --git a/docs/source/API/core/memory_spaces.rst b/docs/source/API/core/memory_spaces.rst index 4fcbaee63..7dea04311 100644 --- a/docs/source/API/core/memory_spaces.rst +++ b/docs/source/API/core/memory_spaces.rst @@ -14,17 +14,11 @@ Memory Spaces .. |TheDocumentationOnTheMemorySpaceConcept| replace:: the documentation on the :cpp:func:`MemorySpace` concept -.. _Experimental: utilities/experimental.html#experimentalnamespace +.. |Experimental| replace:: :doc:`Experimental ` -.. |Experimental| replace:: Experimental +.. |ExecutionSpaceType| replace:: :ref:`ExecutionSpace ` -.. _ExecutionSpaceType: ./execution_spaces.html#kokkos-executionspaceconcept - -.. |ExecutionSpaceType| replace:: :cpp:func:`ExecutionSpace` type - -.. _ExecutionSpaceTypes: ./execution_spaces.html#kokkos-executionspaceconcept - -.. |ExecutionSpaceTypes| replace:: :cpp:func:`ExecutionSpace` types +.. |ExecutionSpaceTypes| replace:: :ref:`ExecutionSpace ` ``Kokkos::CudaSpace`` --------------------- @@ -44,42 +38,44 @@ Memory Spaces ``Kokkos::HIPSpace`` -------------------- -``Kokkos::HIPSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.0` is a |MemorySpaceType|_ representing device memory on a GPU in the HIP GPU programming environment. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. +``Kokkos::HIPSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.0` is a |MemorySpaceType|_ representing device memory on a GPU in the HIP GPU programming environment. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. ``Kokkos::HIPHostPinnedSpace`` ------------------------------ -``Kokkos::HIPHostPinnedSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.0` is a |MemorySpaceType|_ representing host-side pinned memory accessible from a GPU in the HIP GPU programming environment. This memory is accessible by both host and device execution spaces. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. +``Kokkos::HIPHostPinnedSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.0` is a |MemorySpaceType|_ representing host-side pinned memory accessible from a GPU in the HIP GPU programming environment. This memory is accessible by both host and device execution spaces. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. ``Kokkos::HIPManagedSpace`` --------------------------- -``Kokkos::HIPManagedSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.0` is a |MemorySpaceType|_ representing page-migrating memory on a GPU in the HIP GPU programming environment. Page-migrating memory is accessible from most host execution spaces. Even though available with all combinations of operating system and HIP-supported hardware, it requires both operating system and hardware to support and enable the ``xnack`` feature. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. +``Kokkos::HIPManagedSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.0` is a |MemorySpaceType|_ representing page-migrating memory on a GPU in the HIP GPU programming environment. Page-migrating memory is accessible from most host execution spaces. Even though available with all combinations of operating system and HIP-supported hardware, it requires both operating system and hardware to support and enable the ``xnack`` feature. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. ``Kokkos::SYCLDeviceUSMSpace`` -------------------------------------------- -``Kokkos::SYCLDeviceUSMSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.5` is a |MemorySpaceType|_ representing device memory on a GPU in the SYCL GPU programming environment. This memory is only accessible from the SYCL execution space. +``Kokkos::SYCLDeviceUSMSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.5` is a |MemorySpaceType|_ representing device memory on a GPU in the SYCL GPU programming environment. This memory is only accessible from the SYCL execution space. ``Kokkos::SYCLHostUSMSpace`` ------------------------------------------ -``Kokkos::SYCLHostUSMSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.5` is a |MemorySpaceType|_ representing host-side pinned memory accessible from a GPU in the SYCL GPU programming environment. This memory is accessible from both host and SYCL execution spaces. +``Kokkos::SYCLHostUSMSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.5` is a |MemorySpaceType|_ representing host-side pinned memory accessible from a GPU in the SYCL GPU programming environment. This memory is accessible from both host and SYCL execution spaces. ``Kokkos::SYCLSharedUSMSpace`` -------------------------------------------- -``Kokkos::SYCLSharedUSMSpace`` :sup:`promoted from` |Experimental|_ :sup:`since 4.5` is a |MemorySpaceType|_ representing page-migrating memory on a GPU in the SYCL GPU programming environment. This memory is accessible from both host and SYCL execution spaces. +``Kokkos::SYCLSharedUSMSpace`` :sup:`promoted from` |Experimental| :sup:`since 4.5` is a |MemorySpaceType|_ representing page-migrating memory on a GPU in the SYCL GPU programming environment. This memory is accessible from both host and SYCL execution spaces. ``Kokkos::HostSpace`` --------------------- ``Kokkos::HostSpace`` is a |MemorySpaceType|_ representing traditional random access memory accessible from the CPU. Except in rare instances, it should not be used directly, but instead should be used generically as a memory space. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. +.. _kokkos-shared-memory-spaces: + ``Kokkos::SharedSpace`` ----------------------- -``Kokkos::SharedSpace`` :sup:`since 4.0` is a |MemorySpaceType|_ alias representing memory that can be accessed by any enabled |ExecutionSpaceType|_. To achieve this, the memory can be moved to and from the local memory of the processing units represented by the ``ExecutionSpaces``. The movement is done automatically by the OS and driver at the moment of access. If not currently located in the local memory of the accessing processing unit, the memory is moved in chunks (size is backend dependent). These chunks can be moved independently (e.g. only the part that is accessed on the GPU is moved to the GPU) and are treated like local memory while residing on the processing unit. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. +``Kokkos::SharedSpace`` :sup:`since 4.0` is a |MemorySpaceType|_ alias representing memory that can be accessed by any enabled |ExecutionSpaceType|. To achieve this, the memory can be moved to and from the local memory of the processing units represented by the ``ExecutionSpaces``. The movement is done automatically by the OS and driver at the moment of access. If not currently located in the local memory of the accessing processing unit, the memory is moved in chunks (size is backend dependent). These chunks can be moved independently (e.g. only the part that is accessed on the GPU is moved to the GPU) and are treated like local memory while residing on the processing unit. For details, see |TheDocumentationOnTheMemorySpaceConcept|_. Availability can be checked with the preprocessor define ``KOKKOS_HAS_SHARED_SPACE`` or the ``constexpr bool Kokkos::has_shared_space``. For the following backends ``Kokkos::SharedSpace`` is pointing to the corresponding |MemorySpaceType|_: @@ -88,10 +84,12 @@ For the following backends ``Kokkos::SharedSpace`` is pointing to the correspond * SYCL -> ``SYCLSharedUSMSpace`` * Only backends running on host -> ``HostSpace`` +.. _kokkos-host-pinned-space: + ``Kokkos::SharedHostPinnedSpace`` --------------------------------- -``Kokkos::SharedHostPinnedSpace`` :sup:`since 4.0` is a |MemorySpaceType|_ alias which is accessible by all enabled |ExecutionSpaceTypes|_. The memory stays pinned on the host and is available on the device via zero copy access in small chunks (cache lines, memory pages, etc. depending on the backend). Writes to the memory in one ``ExecutionSpace`` become visible in other ``ExecutionSpaces`` at synchronization events. Which events trigger a synchronization depend on the backend specifics. Nevertheless, fences are synchronization events on all backends. +``Kokkos::SharedHostPinnedSpace`` :sup:`since 4.0` is a |MemorySpaceType|_ alias which is accessible by all enabled |ExecutionSpaceType|. The memory stays pinned on the host and is available on the device via zero copy access in small chunks (cache lines, memory pages, etc. depending on the backend). Writes to the memory in one ``ExecutionSpace`` become visible in other ``ExecutionSpaces`` at synchronization events. Which events trigger a synchronization depend on the backend specifics. Nevertheless, fences are synchronization events on all backends. Availability can be checked with the preprocessor define ``KOKKOS_HAS_SHARED_HOST_PINNED_SPACE`` or the ``constexpr bool Kokkos::has_shared_host_pinned_space``. For the following backends ``Kokkos::SharedHostPinnedSpace`` is pointing to the corresponding |MemorySpaceType|_: @@ -100,10 +98,12 @@ For the following backends ``Kokkos::SharedHostPinnedSpace`` is pointing to the * SYCL -> ``SYCLHostUSMSpace`` * Only backends running on host -> ``HostSpace`` +.. _kokkos-memoryspaceconcept: + ``Kokkos::MemorySpaceConcept`` ------------------------------ -The concept of a ``MemorySpace`` is the fundamental abstraction to represent the "where" and the "how" that memory allocation and access takes place in Kokkos. Most code that uses Kokkos should be written to the *generic concept* of a ``MemorySpace`` rather than any specific instance. This page talks practically about how to *use* the common features of memory spaces in Kokkos; for a more formal and theoretical treatment, see `this document `_. +The concept of a ``MemorySpace`` is the fundamental abstraction to represent the "where" and the "how" that memory allocation and access takes place in Kokkos. Most code that uses Kokkos should be written to the *generic concept* of a ``MemorySpace`` rather than any specific instance. This page talks practically about how to *use* the common features of memory spaces in Kokkos; for a more formal and theoretical treatment, see :doc:`this document `. *Disclaimer*: There is nothing new about the term "concept" in C++; anyone who has ever used templates in C++ has used concepts whether they knew it or not. Please do not be confused by the word "concept" itself, which is now more often associated with a shiny new C++20 language feature. Here, "concept" just means "what you're allowed to do with a type that is a template parameter in certain places". @@ -139,20 +139,12 @@ Synopsis Typedefs ~~~~~~~~ -.. _ExecutionSpace: execution_spaces.html#executionspaceconcept - -.. |ExecutionSpace| replace:: :cpp:func:`ExecutionSpace` - -.. _DeepCopyDocumentation: view/deep_copy.html - -.. |DeepCopyDocumentation| replace:: :cpp:func:`deep_copy` documentation - -.. _KokkosSpaceAccessibility: SpaceAccessibility.html +.. |KokkosSpaceAccessibility| replace:: :doc:`Kokkos::SpaceAccessibility ` -.. |KokkosSpaceAccessibility| replace:: :cpp:func:`Kokkos::SpaceAccessibility` +.. |ExecutionSpace| replace:: :ref:`ExecutionSpace ` * ``memory_space``: The self type; -* ``execution_space``: the default |ExecutionSpace|_ to use when constructing objects in memory provided by an instance of ``MemorySpace``, or (potentially) when deep copying from or to such memory (see |DeepCopyDocumentation|_ for details). Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|_). +* ``execution_space``: the default |ExecutionSpace| to use when constructing objects in memory provided by an instance of ``MemorySpace``, or (potentially) when deep copying from or to such memory (see :cpp:func:`Kokkos::deep_copy` documentation for details). Kokkos guarantees that ``Kokkos::SpaceAccessibility::accessible`` will be ``true`` (see |KokkosSpaceAccessibility|). * ``device_type``: ``DeviceType``. Constructors diff --git a/docs/source/API/core/numerics/mathematical-constants.rst b/docs/source/API/core/numerics/mathematical-constants.rst index 3cd3d073b..0da87fdf6 100644 --- a/docs/source/API/core/numerics/mathematical-constants.rst +++ b/docs/source/API/core/numerics/mathematical-constants.rst @@ -84,13 +84,9 @@ double`` constant without the ``_v`` suffix. These are shorthand for the Notes ----- -.. _KnownIssues: ../../../known-issues.html#mathematical-constants - -.. |KnownIssues| replace:: known issues - * The mathematical constants are available in ``Kokkos::Experimental::`` since Kokkos 3.6 * They were "promoted" to the ``Kokkos::numbers`` namespace in 4.0 and removed from ``Kokkos::Experimental::`` in 4.3 -* Passing mathematical constants by reference or taking their address in device code is not supported by some toolchains and hence not portable. (See |KnownIssues|_) +* Passing mathematical constants by reference or taking their address in device code is not supported by some toolchains and hence not portable. (See :ref:`known issue `) * Support for quadruple precision floating-point ``__float128`` can be enabled via ``-DKokkos_ENABLE_LIBQUADMATH=ON``. @@ -111,6 +107,6 @@ Example See also -------- -`Common mathematical functions `_ +:doc:`Common mathematical functions ` -`Numeric traits `_ +:doc:`Numeric traits ` diff --git a/docs/source/API/core/numerics/mathematical-functions.rst b/docs/source/API/core/numerics/mathematical-functions.rst index 32573a7d6..6745577ff 100644 --- a/docs/source/API/core/numerics/mathematical-functions.rst +++ b/docs/source/API/core/numerics/mathematical-functions.rst @@ -417,7 +417,7 @@ Floating point manipulation functions .. |scalbn| replace:: ``scalbn`` -.. _scalbln: https://en.cppreference.com/w/cpp/numeric/math/scalbln +.. _scalbln: https://en.cppreference.com/w/cpp/numeric/math/scalbn .. |scalbln| replace:: ``scalbln`` @@ -582,7 +582,7 @@ Notes * Beware the using-directive ``using namespace Kokkos;`` will cause compilation errors with unqualified calls to math functions. Use explicit qualification (``Kokkos::sqrt``) or using-declaration (``using - Kokkos::sqrt;``) instead. (See |KnownIssues|_) + Kokkos::sqrt;``) instead. (See :doc:`known issues <../../../known-issues>`) * Math functions were removed from the ``Kokkos::Experimental::`` namespace in version 4.3 * Support for quadruple precision floating-point ``__float128`` can be enabled via ``-DKokkos_ENABLE_LIBQUADMATH=ON``. @@ -592,6 +592,6 @@ Notes See also -------- -`Mathematical constant `_ +:doc:`Mathematical constant ` -`Numeric traits `_ +:doc:`Numeric traits ` diff --git a/docs/source/API/core/numerics/numeric-traits.rst b/docs/source/API/core/numerics/numeric-traits.rst index 14a0be765..b62933eaa 100644 --- a/docs/source/API/core/numerics/numeric-traits.rst +++ b/docs/source/API/core/numerics/numeric-traits.rst @@ -131,14 +131,6 @@ Individual traits are SFINAE-friendly, you can detect value presence/absence. **See also** -.. _MathematicalConstants : mathematical-constants.html +:doc:`Mathematical constants ` -.. |MathematicalConstants| replace:: Mathematical constants - -.. _CommonMathematicalFunctions : mathematical-functions.html - -.. |CommonMathematicalFunctions| replace:: Common mathematical functions - -|MathematicalConstants|_ - -|CommonMathematicalFunctions|_ +:doc:`Common mathematical functions ` diff --git a/docs/source/API/core/parallel-dispatch/ParallelForTag.rst b/docs/source/API/core/parallel-dispatch/ParallelForTag.rst index 153e4edd0..410d5efa9 100644 --- a/docs/source/API/core/parallel-dispatch/ParallelForTag.rst +++ b/docs/source/API/core/parallel-dispatch/ParallelForTag.rst @@ -6,11 +6,7 @@ Header File: ```` -.. _parallelFor: ../parallel-dispatch/parallel_for.html - -.. |parallelFor| replace:: :cpp:func:`parallel_for` - -A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a |parallelFor|_ +A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a :doc:`parallel_for <../parallel-dispatch/parallel_for>` Usage ----- diff --git a/docs/source/API/core/parallel-dispatch/ParallelReduceTag.rst b/docs/source/API/core/parallel-dispatch/ParallelReduceTag.rst index d25a6aaed..54c3fa68a 100644 --- a/docs/source/API/core/parallel-dispatch/ParallelReduceTag.rst +++ b/docs/source/API/core/parallel-dispatch/ParallelReduceTag.rst @@ -6,11 +6,7 @@ Header File: ```` -.. _parallelReduce: ../parallel-dispatch/parallel_reduce.html - -.. |parallelReduce| replace:: :cpp:func:`parallel_reduce` - -A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a |parallelReduce|_ +A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a :doc:`parallel_reduce <../parallel-dispatch/parallel_reduce>` Usage ----- diff --git a/docs/source/API/core/parallel-dispatch/ParallelScanTag.rst b/docs/source/API/core/parallel-dispatch/ParallelScanTag.rst index 3833db9fa..e37127f31 100644 --- a/docs/source/API/core/parallel-dispatch/ParallelScanTag.rst +++ b/docs/source/API/core/parallel-dispatch/ParallelScanTag.rst @@ -6,11 +6,7 @@ Header File: ```` -.. _parallelScan: ../parallel-dispatch/parallel_scan.html - -.. |parallelScan| replace:: :cpp:func:`parallel_scan` - -A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a |parallelScan|_ +A tag used in team size calculation functions to indicate that the functor for which a team size is being requested is being used in a :doc:`parallel_scan <../parallel-dispatch/parallel_scan>` Usage ----- diff --git a/docs/source/API/core/parallel-dispatch/fence.rst b/docs/source/API/core/parallel-dispatch/fence.rst index 0fffb0ab3..a856859e1 100644 --- a/docs/source/API/core/parallel-dispatch/fence.rst +++ b/docs/source/API/core/parallel-dispatch/fence.rst @@ -13,10 +13,10 @@ Usage: Kokkos::fence(); Blocks on completion of all outstanding asynchronous Kokkos operations. -That includes parallel dispatch (e.g. `parallel_for() `_, `parallel_reduce() `_ -and `parallel_scan() `_) as well as asynchronous data operations such as three-argument `deep_copy <../view/deep_copy.html>`_. +That includes parallel dispatch (e.g. :cpp:func:`Kokkos::parallel_for`, :cpp:func:`Kokkos::parallel_reduce` +and :cpp:func:`Kokkos::parallel_scan`) as well as asynchronous data operations such as three-argument :cpp:func:`Kokkos::deep_copy`. -Note: there is a execution space instance specific ``fence`` too: `ExecutionSpaceConcept <../execution_spaces.html#executionspaceconcept>`_ +Note: there is a execution space instance specific ``fence`` too: :ref:`ExecutionSpaceConcept ` Interface --------- diff --git a/docs/source/API/core/parallel-dispatch/parallel_for.rst b/docs/source/API/core/parallel-dispatch/parallel_for.rst index aac8c5dfa..f5a423e56 100644 --- a/docs/source/API/core/parallel-dispatch/parallel_for.rst +++ b/docs/source/API/core/parallel-dispatch/parallel_for.rst @@ -13,11 +13,7 @@ Usage: Kokkos::parallel_for(name, policy, functor); Kokkos::parallel_for(policy, functor); -.. _text: ../policies/ExecutionPolicyConcept.html - -.. |text| replace:: *ExecutionPolicy* - -Dispatches parallel work defined by ``functor`` according to the |text|_ ``policy``. The optional label ``name`` is +Dispatches parallel work defined by ``functor`` according to the :doc:`ExecutionPolicy <../policies/ExecutionPolicyConcept>` ``policy``. The optional label ``name`` is used by profiling and debugging tools. This call may be asynchronous and return to the callee immediately. Interface @@ -34,15 +30,15 @@ Parameters: * ExecPolicy: An *ExecutionPolicy* which defines iteration space and other execution properties. Valid policies are: - ``IntegerType``: defines a 1D iteration range, starting from 0 and going to a count. - - `RangePolicy <../policies/RangePolicy.html>`_: defines a 1D iteration range. - - `MDRangePolicy <../policies/MDRangePolicy.html>`_: defines a multi-dimensional iteration space. - - `TeamPolicy <../policies/TeamPolicy.html>`_: defines a 1D iteration range, each of which is assigned to a thread team. - - `TeamVectorRange <../policies/TeamVectorRange.html>`_: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamThreadRange <../policies/TeamThreadRange.html>`_: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `ThreadVectorRange <../policies/ThreadVectorRange.html>`_: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamVectorMDRange <../policies/TeamVectorMDRange.html>`_: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamThreadMDRange <../policies/TeamThreadMDRange.html>`_: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `ThreadVectorMDRange <../policies/ThreadVectorMDRange.html>`_: defines a multi-dimensional iteration space to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`RangePolicy <../policies/RangePolicy>`: defines a 1D iteration range. + - :doc:`MDRangePolicy <../policies/MDRangePolicy>`: defines a multi-dimensional iteration space. + - :doc:`TeamPolicy <../policies/TeamPolicy>`: defines a 1D iteration range, each of which is assigned to a thread team. + - :doc:`TeamVectorRange <../policies/TeamVectorRange>`: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamThreadRange <../policies/TeamThreadRange>`: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`ThreadVectorRange <../policies/ThreadVectorRange>`: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamVectorMDRange <../policies/TeamVectorMDRange>`: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamThreadMDRange <../policies/TeamThreadMDRange>`: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`ThreadVectorMDRange <../policies/ThreadVectorMDRange>`: defines a multi-dimensional iteration space to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. * FunctorType: A valid functor having an ``operator()`` with a matching signature for the ``ExecPolicy``. The functor can be defined using a C++ class/struct or lambda. See Examples below for more detail. diff --git a/docs/source/API/core/parallel-dispatch/parallel_reduce.rst b/docs/source/API/core/parallel-dispatch/parallel_reduce.rst index 9bbc0299e..326e68580 100644 --- a/docs/source/API/core/parallel-dispatch/parallel_reduce.rst +++ b/docs/source/API/core/parallel-dispatch/parallel_reduce.rst @@ -23,48 +23,36 @@ Dispatches parallel work defined by ``functor`` according to the *ExecutionPolic Interface --------- -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const std::string& name, - const ExecPolicy& policy, - const FunctorType& functor); - -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const ExecPolicy& policy, - const FunctorType& functor); - -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const std::string& name, - const ExecPolicy& policy, - const FunctorType& functor, - const ReducerArgument&... reducer); - -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const ExecPolicy& policy, - const FunctorType& functor, - const ReducerArgument&... reducer); - -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const std::string& name, - const ExecPolicy& policy, - const FunctorType& functor, - ReducerArgumentNonConst&... reducer); - -.. code-block:: cpp - - template - Kokkos::parallel_reduce(const ExecPolicy& policy, - const FunctorType& functor, - ReducerArgumentNonConst&... reducer); +.. cpp:function:: template \ + Kokkos::parallel_reduce(const std::string& name, \ + const ExecPolicy& policy, \ + const FunctorType& functor) + +.. cpp:function:: template \ + Kokkos::parallel_reduce(const ExecPolicy& policy, \ + const FunctorType& functor) + +.. cpp:function:: template \ + Kokkos::parallel_reduce(const std::string& name, \ + const ExecPolicy& policy, \ + const FunctorType& functor, \ + const ReducerArgument&... reducer) + +.. cpp:function:: template \ + Kokkos::parallel_reduce(const ExecPolicy& policy, \ + const FunctorType& functor, \ + const ReducerArgument&... reducer) + +.. cpp:function:: template \ + Kokkos::parallel_reduce(const std::string& name, \ + const ExecPolicy& policy, \ + const FunctorType& functor, \ + ReducerArgumentNonConst&... reducer) + +.. cpp:function:: template \ + Kokkos::parallel_reduce(const ExecPolicy& policy, \ + const FunctorType& functor, \ + ReducerArgumentNonConst&... reducer) Parameters: ~~~~~~~~~~~ @@ -73,15 +61,15 @@ Parameters: * ExecPolicy: An *ExecutionPolicy* which defines iteration space and other execution properties. Valid policies are: - ``IntegerType``: defines a 1D iteration range, starting from 0 and going to a count. - - `RangePolicy <../policies/RangePolicy.html>`_: defines a 1D iteration range. - - `MDRangePolicy <../policies/MDRangePolicy.html>`_: defines a multi-dimensional iteration space. - - `TeamPolicy <../policies/TeamPolicy.html>`_: defines a 1D iteration range, each of which is assigned to a thread team. - - `TeamVectorRange <../policies/TeamVectorRange.html>`_: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamVectorMDRange <../policies/TeamVectorMDRange.html>`_: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamThreadRange <../policies/TeamThreadRange.html>`_: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamThreadMDRange <../policies/TeamThreadMDRange.html>`_: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `ThreadVectorRange <../policies/ThreadVectorRange.html>`_: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `ThreadVectorMDRange <../policies/ThreadVectorMDRange.html>`_: defines a multi-dimensional iteration space to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`RangePolicy <../policies/RangePolicy>`: defines a 1D iteration range. + - :doc:`MDRangePolicy <../policies/MDRangePolicy>`: defines a multi-dimensional iteration space. + - :doc:`TeamPolicy <../policies/TeamPolicy>`: defines a 1D iteration range, each of which is assigned to a thread team. + - :doc:`TeamVectorRange <../policies/TeamVectorRange>`: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamVectorMDRange <../policies/TeamVectorMDRange>`: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamThreadRange <../policies/TeamThreadRange>`: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamThreadMDRange <../policies/TeamThreadMDRange>`: defines a multi-dimensional iteration space to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`ThreadVectorRange <../policies/ThreadVectorRange>`: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`ThreadVectorMDRange <../policies/ThreadVectorMDRange>`: defines a multi-dimensional iteration space to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. * FunctorType: A valid functor with (at minimum) an ``operator()`` with a matching signature for the ``ExecPolicy`` combined with the reduced type. * ReducerArgument: Either a class fulfilling the "Reducer" concept or a ``Kokkos::View``. * ReducerArgumentNonConst: A scalar type or an array type; see below for functor requirements. @@ -127,7 +115,7 @@ Semantics Examples -------- -Further examples are provided in the `Custom Reductions <../../../ProgrammingGuide/Custom-Reductions.html>`_ and `ExecutionPolicy <../policies/ExecutionPolicyConcept.html>`_ documentation. +Further examples are provided in the :doc:`Custom Reductions <../../../ProgrammingGuide/Custom-Reductions>` and :doc:`ExecutionPolicy <../policies/ExecutionPolicyConcept>` documentation. .. code-block:: cpp diff --git a/docs/source/API/core/parallel-dispatch/parallel_scan.rst b/docs/source/API/core/parallel-dispatch/parallel_scan.rst index 4deb0521a..e1d8bc93c 100644 --- a/docs/source/API/core/parallel-dispatch/parallel_scan.rst +++ b/docs/source/API/core/parallel-dispatch/parallel_scan.rst @@ -36,11 +36,11 @@ Parameters: * ExecPolicy: An *ExecutionPolicy* which defines iteration space and other execution properties. Valid policies are: - ``IntegerType``: defines a 1D iteration range, starting from 0 and going to a count. - - `RangePolicy <../policies/RangePolicy.html>`_: defines a 1D iteration range. - - `TeamPolicy <../policies/TeamPolicy.html>`_: defines a 1D iteration range, each of which is assigned to a thread team. - - `TeamVectorRange <../policies/TeamVectorRange.html>`_: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `TeamThreadRange <../policies/TeamThreadRange.html>`_: defined a 1D iteration range to be executed through thread parallelization dividing the range over the threads of the team. Only valid inside a parallel region executed through a ``TeamPolicy``. - - `ThreadVectorRange <../policies/ThreadVectorRange.html>`_: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`RangePolicy <../policies/RangePolicy>`: defines a 1D iteration range. + - :doc:`TeamPolicy <../policies/TeamPolicy>`: defines a 1D iteration range, each of which is assigned to a thread team. + - :doc:`TeamVectorRange <../policies/TeamVectorRange>`: defines a 1D iteration range to be executed by a thread-team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`TeamThreadRange <../policies/TeamThreadRange>`: defined a 1D iteration range to be executed through thread parallelization dividing the range over the threads of the team. Only valid inside a parallel region executed through a ``TeamPolicy``. + - :doc:`ThreadVectorRange <../policies/ThreadVectorRange>`: defines a 1D iteration range to be executed through vector parallelization dividing the threads within a team. Only valid inside a parallel region executed through a ``TeamPolicy``. * FunctorType: A valid functor with (at minimum) an ``operator()`` with a signature compatible with the ``ExecPolicy`` and the ``ReturnType``. * ReturnType: a POD type with ``operator +=`` and ``operator =``, or a ``Kokkos::View``. @@ -51,7 +51,7 @@ Requirements: - The ``operator()`` overload without the ``WorkTag`` is used if ``ExecPolicy`` is an ``IntegerType`` or ``ExecPolicy::work_tag`` is ``void``. - ``HandleType`` is an ``IntegerType`` if ``ExecPolicy`` is an ``IntegerType`` else it is ``ExecPolicy::member_type``. -* The type ``ReturnType`` of the ``functor`` operator must be compatible with the ``ReturnType`` of the parallel_scan and must match the arguments of the ``init`` and ``join`` functions of the functor if provided. If the functor doesn't have an ``init`` member function, it is assumed that the identity for the scan operation is given by the default constructor of the value type (and not by `reduction_identity <../builtinreducers/reduction_identity.html>`_). +* The type ``ReturnType`` of the ``functor`` operator must be compatible with the ``ReturnType`` of the parallel_scan and must match the arguments of the ``init`` and ``join`` functions of the functor if provided. If the functor doesn't have an ``init`` member function, it is assumed that the identity for the scan operation is given by the default constructor of the value type (and not by :doc:`reduction_identity <../builtinreducers/reduction_identity>`). * The functor must define ``FunctorType::value_type`` the same as ``ReturnType``. Semantics diff --git a/docs/source/API/core/policies/ExecutionPolicyConcept.rst b/docs/source/API/core/policies/ExecutionPolicyConcept.rst index 5a81fcb58..56f047c79 100644 --- a/docs/source/API/core/policies/ExecutionPolicyConcept.rst +++ b/docs/source/API/core/policies/ExecutionPolicyConcept.rst @@ -4,14 +4,14 @@ .. role::cpp(code) :language: cpp -The concept of an ``ExecutionPolicy`` is the fundamental abstraction to represent "how" the execution of a Kokkos parallel pattern takes place. This page talks practically about how to *use* the common features of execution policies in Kokkos; for a more formal and theoretical treatment, see `this document <../KokkosConcepts.html>`_. +The concept of an ``ExecutionPolicy`` is the fundamental abstraction to represent "how" the execution of a Kokkos parallel pattern takes place. This page talks practically about how to *use* the common features of execution policies in Kokkos; for a more formal and theoretical treatment, see :doc:`this document <../KokkosConcepts>`. *Disclaimer*: There is nothing new about the term "concept" in C++; anyone who has ever used templates in C++ has used concepts whether they knew it or not. Please do not be confused by the word "concept" itself, which is now more often associated with a shiny new C++20 language feature. Here, "concept" just means "what you're allowed to do with a type that is a template parameter in certain places". What is an ``ExecutionPolicy``? ------------------------------- -The dominant parallel dispatch mechanism in Kokkos, described `elsewhere in the programming guide <../../../ProgrammingGuide/ParallelDispatch.html>`_, involves a ``parallel_pattern`` (e.g., something like `Kokkos::parallel_for <../parallel-dispatch/parallel_for.html>`_ or `Kokkos::parallel_reduce <../parallel-dispatch/parallel_reduce.html>`_), an ``ExecutionPolicy``, and a ``Functor``. In a hand-wavy sense: +The dominant parallel dispatch mechanism in Kokkos, described :doc:`elsewhere in the programming guide <../../../ProgrammingGuide/ParallelDispatch>`, involves a ``parallel_pattern`` (e.g., something like :doc:`Kokkos::parallel_for <../parallel-dispatch/parallel_for>` or :doc:`Kokkos::parallel_reduce <../parallel-dispatch/parallel_reduce>`), an ``ExecutionPolicy``, and a ``Functor``. In a hand-wavy sense: .. code-block:: cpp diff --git a/docs/source/API/core/policies/MDRangePolicy.rst b/docs/source/API/core/policies/MDRangePolicy.rst index 9ac7cfddc..07d310ad1 100644 --- a/docs/source/API/core/policies/MDRangePolicy.rst +++ b/docs/source/API/core/policies/MDRangePolicy.rst @@ -32,7 +32,7 @@ Parameters General Template Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Valid template arguments for ``MDRangePolicy`` are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_ +Valid template arguments for ``MDRangePolicy`` are described :ref:`here ` Required Arguments Specific to MDRangePolicy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/source/API/core/policies/NestedPolicies.rst b/docs/source/API/core/policies/NestedPolicies.rst index 8f286315c..053401f8c 100644 --- a/docs/source/API/core/policies/NestedPolicies.rst +++ b/docs/source/API/core/policies/NestedPolicies.rst @@ -38,7 +38,7 @@ List General Template Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Valid template arguments are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_ +Valid template arguments are described :ref:`here ` Usage ~~~~~ diff --git a/docs/source/API/core/policies/RangePolicy.rst b/docs/source/API/core/policies/RangePolicy.rst index 486a2de4a..69e3c4b48 100644 --- a/docs/source/API/core/policies/RangePolicy.rst +++ b/docs/source/API/core/policies/RangePolicy.rst @@ -87,7 +87,7 @@ Parameters General Template Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Valid template arguments for ``RangePolicy`` are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_ +Valid template arguments for ``RangePolicy`` are described :ref:`here ` Public Class Members -------------------- diff --git a/docs/source/API/core/policies/TeamHandleConcept.rst b/docs/source/API/core/policies/TeamHandleConcept.rst index b5c671510..aff2db45f 100644 --- a/docs/source/API/core/policies/TeamHandleConcept.rst +++ b/docs/source/API/core/policies/TeamHandleConcept.rst @@ -24,7 +24,7 @@ Description .. cpp:type:: execution_space - Specifies the `execution space `_ associated to the team + Specifies the `execution space `_ associated to the team .. cpp:type:: scratch_memory_space diff --git a/docs/source/API/core/policies/TeamPolicy.rst b/docs/source/API/core/policies/TeamPolicy.rst index d54ac5e0d..fff482e9b 100644 --- a/docs/source/API/core/policies/TeamPolicy.rst +++ b/docs/source/API/core/policies/TeamPolicy.rst @@ -18,7 +18,7 @@ Usage Execution policy for a 1D iteration space starting at begin and going to end with an open interval. -See also: `TeamMember `_ +See also: :doc:`TeamMember ` Description ----------- @@ -27,7 +27,7 @@ Description .. rubric:: Template Arguments - Valid template arguments for TeamPolicy are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_ + Valid template arguments for TeamPolicy are described :ref:`here ` .. rubric:: Public nested typedefs @@ -119,7 +119,7 @@ Description .. cpp:function:: template int team_size_max(const FunctorType& f, const ParallelReduceTag&) const; - Query the maximum team size possible given a specific functor. The tag denotes whether this is for a |parallelFor|_ or a |parallelReduce|_. + Query the maximum team size possible given a specific functor. The tag denotes whether this is for a :cpp:func:`Kokkos::parallel_for` or a :cpp:func:`Kokkos::parallel_reduce`. Note: this is not a static function! The function will take into account settings for vector length and scratch size of ``*this``. Using a value larger than the return value will result in dispatch failure. If the value returned is non-positive, no valid team size could be found. A common reason is that too much scratch cache memory was requested. Returns: The maximum value for ``team_size`` allowed to be given to be used with an otherwise identical ``TeamPolicy`` for dispatching the functor ``f``. @@ -127,7 +127,7 @@ Description .. cpp:function:: template int team_size_recommended(const FunctorType& f, const ParallelReduceTag&) const; - Query the recommended team size for the specific functor ``f``. The tag denotes whether this is for a |parallelFor|_ or a |parallelReduce|_. + Query the recommended team size for the specific functor ``f``. The tag denotes whether this is for a :cpp:func:`Kokkos::parallel_for` or a :cpp:func:`Kokkos::parallel_reduce`. Note: this is not a static function! The function will take into account settings for vector length and scratch size of ``*this``. If the value returned is non-positive, no valid team size could be found. A common reason is that too much scratch cache memory was requested. Returns: The recommended value for ``team_size`` to be given to be used with an otherwise identical ``TeamPolicy`` for dispatching the functor ``f``. diff --git a/docs/source/API/core/policies/TeamThreadMDRange.rst b/docs/source/API/core/policies/TeamThreadMDRange.rst index 246dbce37..bc2e6aa2b 100644 --- a/docs/source/API/core/policies/TeamThreadMDRange.rst +++ b/docs/source/API/core/policies/TeamThreadMDRange.rst @@ -9,7 +9,7 @@ Header File: ```` Description ----------- -TeamThreadMDRange is a `nested execution policy <./NestedPolicies.html>`_ used inside of hierarchical parallelism. +TeamThreadMDRange is a :doc:`nested execution policy <./NestedPolicies>` used inside of hierarchical parallelism. Interface @@ -31,7 +31,7 @@ Interface * **Requirements** - * ``TeamHandle`` is a type that models `TeamHandle <./TeamHandleConcept.html>`_ + * ``TeamHandle`` is a type that models :doc:`TeamHandle <./TeamHandleConcept>` * ``extent_1, extent_2, ...`` are ints @@ -53,7 +53,7 @@ Interface Restrictions ------------ -Note that when used in `parallel_reduce <../parallel-dispatch/parallel_reduce.html>`_, the reduction is limited to a sum. +Note that when used in :doc:`parallel_reduce <../parallel-dispatch/parallel_reduce>`, the reduction is limited to a sum. Examples -------- diff --git a/docs/source/API/core/policies/TeamThreadRange.rst b/docs/source/API/core/policies/TeamThreadRange.rst index 17d3c6aa2..a82430a9e 100644 --- a/docs/source/API/core/policies/TeamThreadRange.rst +++ b/docs/source/API/core/policies/TeamThreadRange.rst @@ -17,7 +17,7 @@ Usage parallel_scan(TeamThreadRange(team,begin,end), [=] (lint i, double& lsum, bool final) {...},sum); -TeamThreadRange is a `nested execution policy <./NestedPolicies.html>`_ used inside hierarchical parallelism. In contrast to global policies, the public interface for nested policies is implemented as functions, in order to enable implicit templating on the execution space type via the team handle. +TeamThreadRange is a :doc:`nested execution policy <./NestedPolicies>` used inside hierarchical parallelism. In contrast to global policies, the public interface for nested policies is implemented as functions, in order to enable implicit templating on the execution space type via the team handle. Synopsis -------- @@ -49,7 +49,7 @@ Description - Implementation defined type. * **Requirements** - - ``TeamMemberType`` is a type that models `TeamHandle <./TeamHandleConcept.html>`_ + - ``TeamMemberType`` is a type that models :doc:`TeamHandle <./TeamHandleConcept>` - ``std::is_integral::value`` is true. - Every member thread of ``team`` must call the operation in the same branch, i.e. it is not legal to have some threads call this function in one branch, and the other threads of ``team`` call it in another branch. - ``count >= 0`` is true; @@ -71,7 +71,7 @@ Description - Implementation defined type. * **Requirements** - - ``TeamMemberType`` is a type that models `TeamHandle <./TeamHandleConcept.html>`_ + - ``TeamMemberType`` is a type that models :doc:`TeamHandle <./TeamHandleConcept>` - ``std::is_integral::value`` is true. - ``std::is_integral::value`` is true. - Every member thread of ``team`` must call the operation in the same branch, i.e. it is not legal to have some threads call this function in one branch, and the other threads of ``team`` call it in another branch. diff --git a/docs/source/API/core/policies/TeamVectorMDRange.rst b/docs/source/API/core/policies/TeamVectorMDRange.rst index 9d8ed686b..b1c41784a 100644 --- a/docs/source/API/core/policies/TeamVectorMDRange.rst +++ b/docs/source/API/core/policies/TeamVectorMDRange.rst @@ -6,7 +6,7 @@ Header File: ```` Description ----------- -TeamVectorMDRange is a `nested execution policy <./NestedPolicies.html>`_ used inside of hierarchical parallelism. +TeamVectorMDRange is a :doc:`nested execution policy <./NestedPolicies>` used inside of hierarchical parallelism. Interface --------- @@ -26,7 +26,7 @@ Interface * **Requirements** - * ``TeamHandle`` is a type that models `TeamHandle <./TeamHandleConcept.html>`_ + * ``TeamHandle`` is a type that models :doc:`TeamHandle <./TeamHandleConcept>` * ``extent_1, extent_2, ...`` are ints @@ -48,7 +48,7 @@ Interface Restrictions ------------ -Note that when used in `parallel_reduce <../parallel-dispatch/parallel_reduce.html>`_, the reduction is limited to a sum. +Note that when used in :doc:`parallel_reduce <../parallel-dispatch/parallel_reduce>`, the reduction is limited to a sum. Examples -------- diff --git a/docs/source/API/core/policies/TeamVectorRange.rst b/docs/source/API/core/policies/TeamVectorRange.rst index 491081e7a..9afe36410 100644 --- a/docs/source/API/core/policies/TeamVectorRange.rst +++ b/docs/source/API/core/policies/TeamVectorRange.rst @@ -15,7 +15,7 @@ Usage parallel_reduce(TeamVectorRange(team,begin,end), [=] (int i, double& lsum) {...},sum); -TeamVectorRange is a `nested execution policy `_ used inside hierarchical parallelism. +TeamVectorRange is a :doc:`nested execution policy ` used inside hierarchical parallelism. In contrast to global policies, the public interface for nested policies is implemented as functions, in order to enable implicit templating on the execution space type via the team handle. @@ -48,7 +48,7 @@ Splits the index range ``0`` to ``count-1`` over the threads of the team and the - Implementation defined type. * **Requirements** - - ``TeamMemberType`` is a type that models `TeamHandle `_ + - ``TeamMemberType`` is a type that models :doc:`TeamHandle ` - ``std::is_integral::value`` is true. - Every member thread of ``team`` must call the operation in the same branch, i.e. it is not legal to have some threads call this function in one branch, and the other threads of ``team`` call it in another branch. @@ -70,7 +70,7 @@ Splits the index range ``begin`` to ``end-1`` over the threads of the team and t - Implementation defined type. * **Requirements** - - ``TeamMemberType`` is a type that models `TeamHandle `_ + - ``TeamMemberType`` is a type that models :doc:`TeamHandle ` - ``std::is_integral::value`` is true. - ``std::is_integral::value`` is true. - Every member thread of ``team`` must call the operation in the same branch, i.e. it is not legal to have some threads call this function in one branch, and the other threads of ``team`` call it in another branch.. diff --git a/docs/source/API/core/policies/ThreadVectorMDRange.rst b/docs/source/API/core/policies/ThreadVectorMDRange.rst index 7c3ec1af6..9eb45965f 100644 --- a/docs/source/API/core/policies/ThreadVectorMDRange.rst +++ b/docs/source/API/core/policies/ThreadVectorMDRange.rst @@ -9,7 +9,7 @@ Header File: ```` Description ----------- -ThreadVectorMDRange is a `nested execution policy <./NestedPolicies.html>`_ used inside of hierarchical parallelism. +ThreadVectorMDRange is a :doc:`nested execution policy <./NestedPolicies>` used inside of hierarchical parallelism. Interface --------- @@ -29,7 +29,7 @@ Interface * **Requirements** - * ``TeamHandle`` is a type that models `TeamHandle <./TeamHandleConcept.html>`_ + * ``TeamHandle`` is a type that models :doc:`TeamHandle <./TeamHandleConcept>` * ``extent_1, extent_2, ...`` are ints @@ -51,7 +51,7 @@ Interface Restrictions ------------ -Note that when used in `parallel_reduce <../parallel-dispatch/parallel_reduce.html>`_, the reduction is limited to a sum. +Note that when used in :doc:`parallel_reduce <../parallel-dispatch/parallel_reduce>`, the reduction is limited to a sum. Examples -------- diff --git a/docs/source/API/core/policies/ThreadVectorRange.rst b/docs/source/API/core/policies/ThreadVectorRange.rst index 6a0efef1c..e44630d09 100644 --- a/docs/source/API/core/policies/ThreadVectorRange.rst +++ b/docs/source/API/core/policies/ThreadVectorRange.rst @@ -13,7 +13,7 @@ Usage: parallel_scan(ThreadVectorRange(team,range), [=] (int i, double& lsum, bool final) {...}); -ThreadVectorRange is a `nested execution policy `__ used inside hierarchical parallelism. +ThreadVectorRange is a :doc:`nested execution policy ` used inside hierarchical parallelism. In contrast to global policies, the public interface for nested policies is implemented as functions, in order to enable implicit templating on the execution space type via the team handle. @@ -52,13 +52,13 @@ Splits the index range ``0`` to ``count-1`` over the vector lanes of the calling * **Requirements** - * ``TeamMemberType`` is a type that models `TeamHandle `__ + * ``TeamMemberType`` is a type that models :doc:`TeamHandle ` * ``std::is_integral::value`` is true. * ``count >= 0`` is true; - * This function can not be called inside a parallel operation dispatched using a `TeamVectorRange `__ policy or ``ThreadVectorRange`` policy. + * This function can not be called inside a parallel operation dispatched using a :doc:`TeamVectorRange ` policy or ``ThreadVectorRange`` policy. .. code-block:: cpp @@ -83,7 +83,7 @@ Splits the index range ``begin`` to ``end-1`` over the vector lanes of the calli * **Requirements**: - * ``TeamMemberType`` is a type that models `TeamHandle `__ + * ``TeamMemberType`` is a type that models :doc:`TeamHandle ` * ``std::is_integral::value`` is true. @@ -91,7 +91,7 @@ Splits the index range ``begin`` to ``end-1`` over the vector lanes of the calli * ``end >= begin`` is true; - * This function can not be called inside a parallel operation dispatched using a `TeamVectorRange `__ policy or ``ThreadVectorRange`` policy. + * This function can not be called inside a parallel operation dispatched using a :doc:`TeamVectorRange ` policy or ``ThreadVectorRange`` policy. Examples diff --git a/docs/source/API/core/profiling/profiling_section.rst b/docs/source/API/core/profiling/profiling_section.rst index 772413cb2..131f7a8be 100644 --- a/docs/source/API/core/profiling/profiling_section.rst +++ b/docs/source/API/core/profiling/profiling_section.rst @@ -55,4 +55,4 @@ The ``ProfilingSection`` class is non-copyable. **See also** -`ScopedRegion `_: implements a scope-based region ownership wrapper +:doc:`ScopedRegion `: implements a scope-based region ownership wrapper diff --git a/docs/source/API/core/profiling/scoped_region.rst b/docs/source/API/core/profiling/scoped_region.rst index 647137e3d..a406d6e38 100644 --- a/docs/source/API/core/profiling/scoped_region.rst +++ b/docs/source/API/core/profiling/scoped_region.rst @@ -63,4 +63,4 @@ Example **See also** -`ProfilingSection `_: Implements a scope-based section ownership wrapper +:doc:`ProfilingSection `: Implements a scope-based section ownership wrapper diff --git a/docs/source/API/core/spaces/partition_space.rst b/docs/source/API/core/spaces/partition_space.rst index 418a030f5..dd251906e 100644 --- a/docs/source/API/core/spaces/partition_space.rst +++ b/docs/source/API/core/spaces/partition_space.rst @@ -29,7 +29,7 @@ Interface hardware resources as an existing execution space instance. There is no implied synchronization relationship between the newly created instances and the pre-existing instance. - :param space: an execution space instance (see ../execution_spaces.html) + :param space: an execution space instance (see :doc:`execution_spaces <../execution_spaces>`) :param args: the number of created instances is equal to ``sizeof...(Args)``. The relative weight of ``args`` is a hint for the fraction of hardware resources of ``space`` diff --git a/docs/source/API/core/utilities/assert.rst b/docs/source/API/core/utilities/assert.rst index df282484d..da5ba998f 100644 --- a/docs/source/API/core/utilities/assert.rst +++ b/docs/source/API/core/utilities/assert.rst @@ -57,4 +57,4 @@ Notes See also -------- -* `Kokkos::abort() `_ causes abnormal program termination +* :doc:`Kokkos::abort() ` causes abnormal program termination diff --git a/docs/source/API/core/utilities/device_id.rst b/docs/source/API/core/utilities/device_id.rst index 4fd04f427..7bae5e5e0 100644 --- a/docs/source/API/core/utilities/device_id.rst +++ b/docs/source/API/core/utilities/device_id.rst @@ -17,26 +17,10 @@ Returns the id of the device that is used by ``DefaultExecutionSpace`` or **See also** -.. _num_devices : num_devices.html +:doc:`num_devices `: returns the number of devices available to Kokkos -.. |num_devices| replace:: ``num_devices`` +:doc:`num_threads `: returns the number of threads used by Kokkos -.. _num_threads : num_threads.html +:doc:`initialize <../initialize_finalize/initialize>`: initializes the Kokkos execution environment -.. |num_threads| replace:: ``num_threads`` - -.. _initialize: ../initialize_finalize/initialize.html - -.. |initialize| replace:: ``initialize`` - -.. _InitializationSettings: ../initialize_finalize/InitializationSettings.html - -.. |InitializationSettings| replace:: ``InitializationSettings`` - -|num_devices|_: returns the number of devices available to Kokkos - -|num_threads|_: returns the number of threads used by Kokkos - -|initialize|_: initializes the Kokkos execution environment - -|InitializationSettings|_: settings for initializing Kokkos +:doc:`InitializationSettings <../initialize_finalize/InitializationSettings>`: settings for initializing Kokkos diff --git a/docs/source/API/core/utilities/min_max_clamp.rst b/docs/source/API/core/utilities/min_max_clamp.rst index a3d9ce18d..990740634 100644 --- a/docs/source/API/core/utilities/min_max_clamp.rst +++ b/docs/source/API/core/utilities/min_max_clamp.rst @@ -56,20 +56,8 @@ Notes See also -------- -.. _min_element: ../../algorithms/std-algorithms/all/StdMinElement.html +:doc:`min_element <../../algorithms/std-algorithms/all/StdMinElement>`: returns the smallest element in a range -.. |min_element| replace:: ``min_element`` +:doc:`max_element <../../algorithms/std-algorithms/all/StdMaxElement>`: returns the largest element in a range -.. _max_element: ../../algorithms/std-algorithms/all/StdMaxElement.html - -.. |max_element| replace:: ``max_element`` - -.. _minmax_element: ../../algorithms/std-algorithms/all/StdMinMaxElement.html - -.. |minmax_element| replace:: ``minmax_element`` - -|min_element|_: returns the smallest element in a range - -|max_element|_: returns the largest element in a range - -|minmax_element|_: returns the smallest and the largest elements in a range +:doc:`minmax_element <../../algorithms/std-algorithms/all/StdMinMaxElement>`: returns the smallest and the largest elements in a range diff --git a/docs/source/API/core/utilities/num_devices.rst b/docs/source/API/core/utilities/num_devices.rst index 23a192ed4..06aae71a3 100644 --- a/docs/source/API/core/utilities/num_devices.rst +++ b/docs/source/API/core/utilities/num_devices.rst @@ -44,26 +44,10 @@ Example **See also** -.. _device_id : device_id.html +:doc:`device_id `: returns the id of the device used by Kokkos -.. |device_id| replace:: ``device_id`` +:doc:`num_threads `: returns the number of threads used by Kokkos -.. _num_threads : num_threads.html +:doc:`initialize <../initialize_finalize/initialize>`: initializes the Kokkos execution environment -.. |num_threads| replace:: ``num_threads`` - -.. _initialize: ../initialize_finalize/initialize.html - -.. |initialize| replace:: ``initialize`` - -.. _InitializationSettings: ../initialize_finalize/InitializationSettings.html - -.. |InitializationSettings| replace:: ``InitializationSettings`` - -|device_id|_: returns the id of the device used by Kokkos - -|num_threads|_: returns the number of threads used by Kokkos - -|initialize|_: initializes the Kokkos execution environment - -|InitializationSettings|_: settings for initializing Kokkos +:doc:`InitializationSettings <../initialize_finalize/InitializationSettings>`: settings for initializing Kokkos diff --git a/docs/source/API/core/utilities/num_threads.rst b/docs/source/API/core/utilities/num_threads.rst index 835d29462..c2f790636 100644 --- a/docs/source/API/core/utilities/num_threads.rst +++ b/docs/source/API/core/utilities/num_threads.rst @@ -16,26 +16,10 @@ Returns the number of concurrent threads that are used by ``DefaultHostExecution **See also** -.. _device_id : device_id.html +:doc:`num_devices `: returns the number of devices available to Kokkos -.. |device_id| replace:: ``device_id`` +:doc:`device_id `: returns the id of the device used by Kokkos -.. _num_devices : num_devices.html +:doc:`initialize <../initialize_finalize/initialize>`: initializes the Kokkos execution environment -.. |num_devices| replace:: ``num_devices`` - -.. _initialize: ../initialize_finalize/initialize.html - -.. |initialize| replace:: ``initialize`` - -.. _InitializationSettings: ../initialize_finalize/InitializationSettings.html - -.. |InitializationSettings| replace:: ``InitializationSettings`` - -|num_devices|_: returns the number of devices available to Kokkos - -|device_id|_: returns the id of the device used by Kokkos - -|initialize|_: initializes the Kokkos execution environment - -|InitializationSettings|_: settings for initializing Kokkos +:doc:`InitializationSettings <../initialize_finalize/InitializationSettings>`: settings for initializing Kokkos diff --git a/docs/source/API/core/view/Subview_type.rst b/docs/source/API/core/view/Subview_type.rst index a6277d543..1f481b7e1 100644 --- a/docs/source/API/core/view/Subview_type.rst +++ b/docs/source/API/core/view/Subview_type.rst @@ -4,16 +4,14 @@ .. role:: cpp(code) :language: cpp -.. _subviewfunc: subview.html - -.. |subviewfunc| replace:: ``Kokkos::subview()`` +.. |subviewfunc| replace:: :doc:`Kokkos::subview() ` Header File: ``Kokkos_Core.hpp`` Description ----------- -Alias template to deduce the type that is returned by a call to the |subviewfunc|_ function with given arguments. +Alias template to deduce the type that is returned by a call to the |subviewfunc| function with given arguments. Interface --------- @@ -32,7 +30,7 @@ Requires: - ``ViewType`` is a specialization of ``Kokkos::View`` -- ``Args...`` are slice specifiers as defined in |subviewfunc|_. +- ``Args...`` are slice specifiers as defined in |subviewfunc|. - ``sizeof... (Args) == ViewType::rank()``. diff --git a/docs/source/API/core/view/create_mirror.rst b/docs/source/API/core/view/create_mirror.rst index 605fdbba4..78334339c 100644 --- a/docs/source/API/core/view/create_mirror.rst +++ b/docs/source/API/core/view/create_mirror.rst @@ -6,11 +6,7 @@ Header File: ```` -.. _deepCopy: deep_copy.html - -.. |deepCopy| replace:: :cpp:func:`deep_copy` - -A common desired use case is to have a memory allocation in GPU memory and an identical memory allocation in CPU memory, such that copying from one to another is straightforward. To satisfy this use case and others, Kokkos has facilities for dealing with "mirrors" of View. A "mirror" of a View type ``A`` is loosely defined a View type ``B`` such that Views of type ``B`` are accessible from the CPU and |deepCopy|_ between Views of type ``A`` and ``B`` are direct. The most common functions for dealing with mirrors are ``create_mirror``, ``create_mirror_view`` and ``create_mirror_view_and_copy``. +A common desired use case is to have a memory allocation in GPU memory and an identical memory allocation in CPU memory, such that copying from one to another is straightforward. To satisfy this use case and others, Kokkos has facilities for dealing with "mirrors" of View. A "mirror" of a View type ``A`` is loosely defined a View type ``B`` such that Views of type ``B`` are accessible from the CPU and :cpp:func:`Kokkos::deep_copy ` between Views of type ``A`` and ``B`` are direct. The most common functions for dealing with mirrors are ``create_mirror``, ``create_mirror_view`` and ``create_mirror_view_and_copy``. Usage ----- @@ -34,60 +30,53 @@ Use ``create_mirror_view`` when the mirror is solely used for providing access i Description ----------- -.. _View: view.html - -.. |View| replace:: :cpp:class:`View` - -.. _ExecutionSpaceConcept: ../execution_spaces.html#executionspaceconcept - -.. |ExecutionSpaceConcept| replace:: :cpp:func:`ExecutionSpaceConcept` - -.. _MemorySpaceConcept: ../memory_spaces.html#memoryspaceconcept +.. |View| replace:: :doc:`View ` -.. |MemorySpaceConcept| replace:: :cpp:func:`MemorySpaceConcept` +.. |ExecutionSpaceConcept| replace:: :ref:`kokkos-executionspaceconcept` +.. |MemorySpaceConcept| replace:: :ref:`kokkos-memoryspaceconcept` .. cpp:function:: template typename ViewType::host_mirror_type create_mirror(ViewType const& src); - Creates a new host accessible |View|_ with the same layout and padding as ``src`` + Creates a new host accessible |View| with the same layout and padding as ``src`` - ``src``: a ``Kokkos::View``. .. cpp:function:: template typename ViewType::host_mirror_type create_mirror(decltype(Kokkos::WithoutInitializing), ViewType const& src); - Creates a new host accessible |View|_ with the same layout and padding as ``src``. The new view will have uninitialized data. + Creates a new host accessible |View| with the same layout and padding as ``src``. The new view will have uninitialized data. - ``src``: a ``Kokkos::View``. .. cpp:function:: template ImplMirrorType create_mirror(Space const& space, ViewType const& src); - Creates a new |View|_ with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. + Creates a new |View| with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. - ``src``: a ``Kokkos::View``. - - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept|_ or |MemorySpaceConcept|_ + - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept| or |MemorySpaceConcept| - ``ImplMirrorType``: an implementation defined specialization of ``Kokkos::View``. .. cpp:function:: template ImplMirrorType create_mirror(decltype(Kokkos::WithoutInitializing), Space const& space, ViewType const& src); - Creates a new |View|_ with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. The new view will have uninitialized data. + Creates a new |View| with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. The new view will have uninitialized data. - ``src``: a ``Kokkos::View``. - - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept|_ or |MemorySpaceConcept|_ + - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept| or |MemorySpaceConcept| - ``ImplMirrorType``: an implementation defined specialization of ``Kokkos::View``. .. cpp:function:: template auto create_mirror(ALLOC_PROP const& arg_prop, ViewType const& src); - Creates a new |View|_ with the same layout and padding as ``src`` - using the |View|_ constructor properties ``arg_prop``, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. - If ``arg_prop`` contains a memory space, a |View|_ in that space is created. Otherwise, a |View|_ in host-accessible memory is returned. + Creates a new |View| with the same layout and padding as ``src`` + using the |View| constructor properties ``arg_prop``, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. + If ``arg_prop`` contains a memory space, a |View| in that space is created. Otherwise, a |View| in host-accessible memory is returned. - ``src``: a ``Kokkos::View``. - - ``arg_prop``: |View|_ constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. + - ``arg_prop``: |View| constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. .. important:: @@ -97,51 +86,51 @@ Description .. cpp:function:: template typename ViewType::host_mirror_type create_mirror_view(ViewType const& src); If ``src`` is not host accessible (i.e. if ``SpaceAccessibility::accessible`` is ``false``) - it creates a new host accessible |View|_ with the same layout and padding as ``src``. Otherwise returns ``src``. + it creates a new host accessible |View| with the same layout and padding as ``src``. Otherwise returns ``src``. - ``src``: a ``Kokkos::View``. .. cpp:function:: template typename ViewType::host_mirror_type create_mirror_view(decltype(Kokkos::WithoutInitializing), ViewType const& src); If ``src`` is not host accessible (i.e. if ``SpaceAccessibility::accessible`` is ``false``) - it creates a new host accessible |View|_ with the same layout and padding as ``src``. The new view will have uninitialized data. Otherwise returns ``src``. + it creates a new host accessible |View| with the same layout and padding as ``src``. The new view will have uninitialized data. Otherwise returns ``src``. - ``src``: a ``Kokkos::View``. .. cpp:function:: template ImplMirrorType create_mirror_view(Space const& space, ViewType const& src); - If ``std::is_same::value`` is ``false``, creates a new |View|_ with + If ``std::is_same::value`` is ``false``, creates a new |View| with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. Otherwise returns ``src``. - ``src``: a ``Kokkos::View``. - - ``Space`` : a class meeting the requirements of |ExecutionSpaceConcept|_ or |MemorySpaceConcept|_ + - ``Space`` : a class meeting the requirements of |ExecutionSpaceConcept| or |MemorySpaceConcept| - ``ImplMirrorType``: an implementation defined specialization of ``Kokkos::View``. .. cpp:function:: template ImplMirrorType create_mirror_view(decltype(Kokkos::WithoutInitializing), Space const& space, ViewType const& src); If ``std::is_same::value`` is ``false``, - creates a new |View|_ with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. + creates a new |View| with the same layout and padding as ``src`` but with a device type of ``Space::device_type``. The new view will have uninitialized data. Otherwise returns ``src``. - ``src``: a ``Kokkos::View``. - - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept|_ or |MemorySpaceConcept|_ + - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept| or |MemorySpaceConcept| - ``ImplMirrorType``: an implementation defined specialization of ``Kokkos::View``. .. cpp:function:: template auto create_mirror_view(ALLOC_PROP const& arg_prop, ViewType const& src); - If the |View|_ constructor arguments ``arg_prop`` (created by a call to `Kokkos::view_alloc`) include a memory space and the memory space - doesn't match the memory space of ``src``, creates a new |View|_ in the specified memory_space. If the ``arg_prop`` don't include a memory - space and the memory space of ``src`` is not host-accessible, creates a new host-accessible |View|_. - Otherwise, ``src`` is returned. If a new |View|_ is created, the implicitly called constructor respects ``arg_prop`` + If the |View| constructor arguments ``arg_prop`` (created by a call to `Kokkos::view_alloc`) include a memory space and the memory space + doesn't match the memory space of ``src``, creates a new |View| in the specified memory_space. If the ``arg_prop`` don't include a memory + space and the memory space of ``src`` is not host-accessible, creates a new host-accessible |View|. + Otherwise, ``src`` is returned. If a new |View| is created, the implicitly called constructor respects ``arg_prop`` and uses the same layout and padding as ``src``. - ``src``: a ``Kokkos::View``. - - ``arg_prop``: |View|_ constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. + - ``arg_prop``: |View| constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::WithoutInitializing)``. .. important:: @@ -155,20 +144,20 @@ Description - ``src``: a ``Kokkos::View``. - - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept|_ or |MemorySpaceConcept|_ + - ``Space``: a class meeting the requirements of |ExecutionSpaceConcept| or |MemorySpaceConcept| - ``ImplMirrorType``: an implementation defined specialization of ``Kokkos::View``. .. cpp:function:: template ImplMirrorType create_mirror_view_and_copy(ALLOC_PROP const& arg_prop, ViewType const& src); - If the memory space included in the |View|_ constructor arguments ``arg_prop`` (created by a call to `Kokkos::view_alloc`) does not match the memory - space of ``src``, creates a new |View|_ in the specified memory space using ``arg_prop`` and the same layout + If the memory space included in the |View| constructor arguments ``arg_prop`` (created by a call to `Kokkos::view_alloc`) does not match the memory + space of ``src``, creates a new |View| in the specified memory space using ``arg_prop`` and the same layout and padding as ``src``. Additionally, a ``deep_copy`` from ``src`` to the new view is executed (using the execution space contained in ``arg_prop`` if provided). Otherwise returns ``src``. - ``src``: a ``Kokkos::View``. - - ``arg_prop``: |View|_ constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::HostSpace{}, Kokkos::WithoutInitializing)``. + - ``arg_prop``: |View| constructor properties, e.g., ``Kokkos::view_alloc(Kokkos::HostSpace{}, Kokkos::WithoutInitializing)``. .. important:: diff --git a/docs/source/API/core/view/deep_copy.rst b/docs/source/API/core/view/deep_copy.rst index a12fcff73..85cd49011 100644 --- a/docs/source/API/core/view/deep_copy.rst +++ b/docs/source/API/core/view/deep_copy.rst @@ -15,7 +15,7 @@ Usage Kokkos::deep_copy(dest, src); Copies data from ``src`` to ``dest``, where ``src`` and ``dest`` -can be `Kokkos::Views `_ or scalars under certain circumstances. +can be :doc:`Kokkos::Views ` or scalars under certain circumstances. Interface --------- @@ -35,16 +35,16 @@ Interface Parameters ~~~~~~~~~~ -* ExecSpace: An `ExecutionSpace <../execution_spaces.html>`_ +* ExecSpace: An :doc:`ExecutionSpace <../execution_spaces>` -* ViewDest: A `view-like type `_ with a non-const ``value_type`` +* ViewDest: A :doc:`view-like type ` with a non-const ``value_type`` -* ViewSrc: A `view-like type `_. +* ViewSrc: A :doc:`view-like type `. Requirements ~~~~~~~~~~~~ -* If ``src`` and ``dest`` are `Kokkos::View `_ s, then all the following are true: +* If ``src`` and ``dest`` are :doc:`Kokkos::View ` s, then all the following are true: - ``std::is_same::value == true`` @@ -52,16 +52,16 @@ Requirements - For all ``k`` in ``[0, dest.rank)`` ``dest.extent(k) == src.extent(k)`` (or the same as ``dest.rank()``) - - ``src.span_is_contiguous() && dest.span_is_contiguous() && std::is_same::value``, *or* there exists an `ExecutionSpace <../execution_spaces.html>`_ ``copy_space`` (either given or defaulted) such that both ``SpaceAccessibility::accessible == true`` and ``SpaceAccessibility::accessible == true``. + - ``src.span_is_contiguous() && dest.span_is_contiguous() && std::is_same::value``, *or* there exists an :doc:`ExecutionSpace <../execution_spaces>` ``copy_space`` (either given or defaulted) such that both ``SpaceAccessibility::accessible == true`` and ``SpaceAccessibility::accessible == true``. -* If ``src`` is a `Kokkos::View `_ and ``dest`` is a scalar, then ``src.rank == 0`` is true. +* If ``src`` is a :doc:`Kokkos::View ` and ``dest`` is a scalar, then ``src.rank == 0`` is true. Semantics --------- -* If no `ExecutionSpace <../execution_spaces.html>`_ argument is provided, all outstanding operations (kernels, copy operation) in any execution spaces will be finished before the copy is executed, and the copy operation is finished before the call returns. +* If no :doc:`ExecutionSpace <../execution_spaces>` argument is provided, all outstanding operations (kernels, copy operation) in any execution spaces will be finished before the copy is executed, and the copy operation is finished before the call returns. -* If an `ExecutionSpace <../execution_spaces.html>`_ argument ``exec_space`` is provided the call is potentially asynchronous—i.e., the call returns before the copy operation is executed. In that case the copy operation will occur only after any already submitted work to ``exec_space`` is finished, and the copy operation will be finished before any work submitted to ``exec_space`` after the ``deep_copy`` call returns is executed. Note: the copy operation is only synchronous with respect to work in the specific execution space instance, but not necessarily with work in other instances of the same type. This behaves analogous to issuing a ``cudaMemcpyAsync`` into a specific CUDA stream, without any additional synchronization. +* If an :doc:`ExecutionSpace <../execution_spaces>` argument ``exec_space`` is provided the call is potentially asynchronous—i.e., the call returns before the copy operation is executed. In that case the copy operation will occur only after any already submitted work to ``exec_space`` is finished, and the copy operation will be finished before any work submitted to ``exec_space`` after the ``deep_copy`` call returns is executed. Note: the copy operation is only synchronous with respect to work in the specific execution space instance, but not necessarily with work in other instances of the same type. This behaves analogous to issuing a ``cudaMemcpyAsync`` into a specific CUDA stream, without any additional synchronization. Examples -------- diff --git a/docs/source/API/core/view/memoryTraits.rst b/docs/source/API/core/view/memoryTraits.rst index 5daf851f4..ae6f177d1 100644 --- a/docs/source/API/core/view/memoryTraits.rst +++ b/docs/source/API/core/view/memoryTraits.rst @@ -38,18 +38,14 @@ Struct Interface A boolean that indicates whether the Aligned trait is enabled. -.. _MemoryAccessTraits: ../../../ProgrammingGuide/View.html#memory-access-traits +.. |MemoryAccessTraits| replace:: :ref:`memory access traits ` -.. |MemoryAccessTraits| replace:: memory access traits - -.. _UnmanagedViews: ../../../ProgrammingGuide/View.html#unmanaged-views - -.. |UnmanagedViews| replace:: unmanaged views +.. |UnmanagedViews| replace:: :ref:`unmanaged views ` Non-Member Enums ^^^^^^^^^^^^^^^^ -The following enumeration values are used to specify the memory access traits. Check the sub-section on |MemoryAccessTraits|_ in the Programming Guide for further information about how these traits can be used in practice. +The following enumeration values are used to specify the memory access traits. Check the sub-section on |MemoryAccessTraits| in the Programming Guide for further information about how these traits can be used in practice. .. cpp:enum:: MemoryTraitsFlags @@ -85,7 +81,7 @@ The following type aliases are also available in the ``Kokkos`` namespace. .. cpp:type:: MemoryRandomAccess = Kokkos::MemoryTraits; .. deprecated:: 4.7 - Managed memory as an explicit memory trait (i.e., ``using MemoryManaged = Kokkos::MemoryTraits<>;``) has been deprecated in Kokkos 4.7. Also, in earlier versions of Kokkos, the enumeration value of ``0`` had to be explicitly mentioned, i.e., ``Kokkos::MemoryTraits<0>``. Check the sub-section on |UnmanagedViews|_ for a discussion about this. + Managed memory as an explicit memory trait (i.e., ``using MemoryManaged = Kokkos::MemoryTraits<>;``) has been deprecated in Kokkos 4.7. Also, in earlier versions of Kokkos, the enumeration value of ``0`` had to be explicitly mentioned, i.e., ``Kokkos::MemoryTraits<0>``. Check the sub-section on |UnmanagedViews| for a discussion about this. Note that in order to use a managed View in a random access manner, the memory trait should be specified as ``Kokkos::MemoryTraits`` and not ``Kokkos::MemoryRandomAccess``. diff --git a/docs/source/API/core/view/subview.rst b/docs/source/API/core/view/subview.rst index 891461bca..5fc8b19ad 100644 --- a/docs/source/API/core/view/subview.rst +++ b/docs/source/API/core/view/subview.rst @@ -15,10 +15,7 @@ Usage Creates a ``Kokkos::View`` representing a subset of another ``Kokkos::View``. - -.. _KokkosAll: ../utilities/all.html#kokkosall - -.. |KokkosAll| replace:: :cpp:func:`Kokkos::ALL` +.. |KokkosAll| replace:: :doc:`Kokkos::ALL <../utilities/all>` Description ----------- @@ -33,9 +30,9 @@ Description than the rank of ``v`` and the values referenced by ``s`` correspond to the values associated with using the integer argument in the corresponding position during indexing into ``v``. - * Passing |KokkosAll|_ as the ``r``\ th argument is equivalent to passing ``pair(0,v.extent(r))`` as the ``r``\ th argument. + * Passing |KokkosAll| as the ``r``\ th argument is equivalent to passing ``pair(0,v.extent(r))`` as the ``r``\ th argument. - * If the ``r``\ th argument ``arg_r`` is the ``d``\ th range (\ ``std::pair``\ , ``Kokkos::pair`` or |KokkosAll|_ ) + * If the ``r``\ th argument ``arg_r`` is the ``d``\ th range (\ ``std::pair``\ , ``Kokkos::pair`` or |KokkosAll| ) in the argument list than ``s.extent(d) = arg_r.second-arg_r.first``\ , and dimension ``d`` of ``s`` references the range ``[arg_r.first,arg_r.second)`` of dimension ``r`` of ``v``. @@ -51,7 +48,7 @@ Description - ``iType`` with ``std::is_integral::value`` being true. - - ``std::remove_const_t< decltype(``\ |KokkosAll|_ ``)>`` + - ``std::remove_const_t< decltype(``\ |KokkosAll| ``)>`` * If the ``r``\ th argument ``arg_r`` is of type ``std::pair`` or ``Kokkos::pair`` it must meet: diff --git a/docs/source/API/core/view/view.rst b/docs/source/API/core/view/view.rst index 21731cace..f66914160 100644 --- a/docs/source/API/core/view/view.rst +++ b/docs/source/API/core/view/view.rst @@ -7,9 +7,7 @@ Header File: ```` .. |CppReferenceSharedPtr| replace:: ``std::shared_ptr`` -.. _ProgrammingGuide: ../../../ProgrammingGuide/View.html#memory-access-traits - -.. |ProgrammingGuide| replace:: Programming Guide +.. |ProgrammingGuide| replace:: :ref:`Programming Guide ` Class Interface --------------- @@ -64,7 +62,7 @@ Class Interface - ``Restrict`` - ``Aligned`` - See the sub-section on memory access traits in the |ProgrammingGuide|_ also for further information. + See the sub-section on memory access traits in the |ProgrammingGuide| also for further information. .. Pushing a "namespace" here; this doesn't create a namespace entity but tells Sphinx that everything between here and the pop is part of the View class. @@ -182,7 +180,6 @@ Data Handles :cpp:func:`access()` - .. cpp:type:: pointer_type pointer to :cpp:type:`value_type`. diff --git a/docs/source/API/core/view/view_like.rst b/docs/source/API/core/view/view_like.rst index 69c15fc0f..b37dae367 100644 --- a/docs/source/API/core/view/view_like.rst +++ b/docs/source/API/core/view/view_like.rst @@ -1,11 +1,11 @@ View-like Types =============== -View-like types are loosely defined as the set of class templates that behave like `Kokkos::View `__ from an interface perspective. There is not a full formal definition of what this means yet, which means there is no way for users to add to this list in a way that the new class is recognized by Kokkos facilities operating on View-like things. In Kokkos these class templates are considered View-like: +View-like types are loosely defined as the set of class templates that behave like :doc:`Kokkos::View ` from an interface perspective. There is not a full formal definition of what this means yet, which means there is no way for users to add to this list in a way that the new class is recognized by Kokkos facilities operating on View-like things. In Kokkos these class templates are considered View-like: -* `Kokkos::View `_ -* `Kokkos::DynRankView <../../containers/DynRankView.html>`_ -* `Kokkos::OffsetView <../../containers/Offset-View.html>`_ -* `Kokkos::DynamicView <../../containers/DynamicView.html>`_ +* :doc:`Kokkos::View ` +* :doc:`Kokkos::DynRankView <../../containers/DynRankView>` +* :doc:`Kokkos::OffsetView <../../containers/Offset-View>` +* :doc:`Kokkos::DynamicView <../../containers/DynamicView>` -Notably, `Kokkos::DualView <../../containers/DualView.html>`_ and `Kokkos::ScatterView <../../containers/ScatterView.html>`_ are **not** included in this category. +Notably, :doc:`Kokkos::DualView <../../containers/DualView>` and :doc:`Kokkos::ScatterView <../../containers/ScatterView>` are **not** included in this category. diff --git a/docs/source/ProgrammingGuide/Graph.rst b/docs/source/ProgrammingGuide/Graph.rst index 869de375e..c326d9576 100644 --- a/docs/source/ProgrammingGuide/Graph.rst +++ b/docs/source/ProgrammingGuide/Graph.rst @@ -112,7 +112,7 @@ For now, *capture* is only supported for the following backends: * - :cpp:`HIP` - `HIP stream capture `_ * - :cpp:`SYCL` - - `SYCL queue recording `_ + - `SYCL queue recording `_ .. note:: diff --git a/docs/source/ProgrammingGuide/Initialization.rst b/docs/source/ProgrammingGuide/Initialization.rst index 85924d231..5585f8b16 100644 --- a/docs/source/ProgrammingGuide/Initialization.rst +++ b/docs/source/ProgrammingGuide/Initialization.rst @@ -10,9 +10,9 @@ All primary capabilities of Kokkos are provided by the `Kokkos_Core.hpp` header Some capabilities - specifically data structures in the `containers` subpackage and algorithmic capabilities in the `algorithms` subpackage are included via separate header files. For specific capabilities check their API reference: -- `API: Core <../API/core-index.html>`_ -- `API: Containers <../API/containers-index.html>`_ -- `API: Algorithms <../API/algorithms-index.html>`_ +- :doc:`API: Core <../API/core-index>` +- :doc:`API: Containers <../API/containers-index>` +- :doc:`API: Algorithms <../API/algorithms-index>` Initialization by command-line arguments ---------------------------------------- @@ -50,7 +50,9 @@ Kokkos chooses the two spaces using the following list: The highest execution space in the list that is enabled is Kokkos' default execution space, and the highest enabled host execution space is Kokkos' default host execution space. For example, if `Kokkos::Cuda`, `Kokkos::OpenMP`, and `Kokkos::Serial` are enabled, then `Kokkos::Cuda` is the default execution space and `Kokkos::OpenMP` is the default host execution space\ :sup:`1`. In cases where the highest enabled backend is a host parallel execution space the `DefaultExecutionSpace` and the `DefaultHostExecutionSpace` will be the same. -`Kokkos::initialize <../API/Initialize-and-Finalize.html#kokos-initialize>`_ parses the command line for flags prefixed with `--kokkos-`, and removes all recognized flags. Argument options are given with an equals (`=`) sign. If the same argument occurs more than once, the last one is used. For example, the arguments +:doc:`Kokkos::initialize <../API/core/initialize_finalize/initialize>` parses the command line for flags prefixed with `--kokkos-`, and removes all recognized flags. Argument options are given with an equals (`=`) sign. If the same argument occurs more than once, the last one is used. For example, the arguments + +.. code-block:: bash --kokkos-threads=4 --kokkos-threads=3 @@ -103,15 +105,15 @@ Initialization by environment variable Instead of using command-line arguments, one may use environment variables. The environment variables are identical to the arguments in :ref:`Table 4.1 ` but they are upper case and the dash is replaced by an underscore. For example, if we want to set the number of threads to 3, we may use -.. code-block:: sh +.. code-block:: bash - KOKKOS_NUM_THREADS=3 + export KOKKOS_NUM_THREADS=3 Initialization by struct ------------------------ -Instead of giving `Kokkos::initialize() <../API/core/initialize_finalize/initialize.html>`_ command-line arguments, one may directly pass in initialization parameters using the `Kokkos::InitializationSettings` struct. If one wants to set options using the struct, one can use the functions `set_xxx` where `xxx` is identical to the arguments in :ref:`Table 4.1 ` where the dash has been replaced by an underscore. To check if a variable has been set, one can use the `has_xxx` functions. Finally, to get the value that was set, one can use the `get_xxx` functions. +Instead of giving :doc:`Kokkos::initialize() <../API/core/initialize_finalize/initialize>` command-line arguments, one may directly pass in initialization parameters using the `Kokkos::InitializationSettings` struct. If one wants to set options using the struct, one can use the functions `set_xxx` where `xxx` is identical to the arguments in :ref:`Table 4.1 ` where the dash has been replaced by an underscore. To check if a variable has been set, one can use the `has_xxx` functions. Finally, to get the value that was set, one can use the `get_xxx` functions. If you do not set `num_threads`, Kokkos will try to determine a default value if possible or otherwise set it to 1. In particular, Kokkos can use the `hwloc` library to determine default settings using the assumption that the process binding mask is unique, i.e., that this process does not share any cores with another process. Note that the default value of each parameter is -1. @@ -131,7 +133,7 @@ Here is an example of how to use the struct. Finalization ------------ -At the end of each program, Kokkos needs to be shut down in order to free resources; do this by calling `Kokkos::finalize() <../API/core/initialize_finalize/finalize.html>`_. You may wish to set this to be called automatically at program exit, either by setting an `atexit` hook or by attaching the function to `MPI_COMM_SELF` so that it is called automatically at `MPI_Finalize`. +At the end of each program, Kokkos needs to be shut down in order to free resources; do this by calling :doc:`Kokkos::finalize() <../API/core/initialize_finalize/finalize>`. You may wish to set this to be called automatically at program exit, either by setting an `atexit` hook or by attaching the function to `MPI_COMM_SELF` so that it is called automatically at `MPI_Finalize`. Example Code ------------ diff --git a/docs/source/ProgrammingGuide/Machine-Model.rst b/docs/source/ProgrammingGuide/Machine-Model.rst index 6c1852bdd..f37864092 100644 --- a/docs/source/ProgrammingGuide/Machine-Model.rst +++ b/docs/source/ProgrammingGuide/Machine-Model.rst @@ -7,35 +7,15 @@ Machine Model .. |node| image:: figures/kokkos-node-doc.png :alt: Figure 2.1 Conceptual Model of a Future High Performance Computing Node -.. _Chap7ParallelDispatch: ParallelDispatch.html -.. |Chap7ParallelDispatch| replace:: Chapter 7 - Parallel dispatch - .. |execution-space| image:: figures/kokkos-execution-space-doc.png :alt: Figure 2.2 Example Execution Spaces in a Future Computing Node .. |memory-space| image:: figures/kokkos-memory-space-doc.png :alt: Figure 2.3 Example Memory Spaces in a Future Computing Node -.. _ViewAllocation: View.html -.. |ViewAllocation| replace:: View allocation - -.. _Initialization: Initialization.html -.. |Initialization| replace:: Initialization - -.. _Section82: HierarchicalParallelism.html#hp-thread-teams -.. |Section82| replace:: Section 8.2 - -.. _Chap8HierarchicalParallelism: HierarchicalParallelism.html -.. |Chap8HierarchicalParallelism| replace:: Chapter 8 - Hierarchical Parallelism +.. |ParallelFor| replace:: :cpp:func:`Kokkos::parallel_for` -.. _Section231: Machine-Model.html#thread-safety -.. |Section231| replace:: Section 2.3.1 - -.. _ParallelFor: ../API/core/parallel-dispatch/parallel_for.html -.. |ParallelFor| replace:: ``parallel_for()`` - -.. _Fence: ../API/core/parallel-dispatch/fence.html -.. |Fence| replace:: ``fence()`` +.. |Fence| replace:: :doc:`Fence() <../API/core/parallel-dispatch/fence>` After reading this chapter you will understand the abstract model of a parallel computing node which underlies the design choices and structure of the Kokkos framework. The machine model ensures the applications written using Kokkos will have portability across architectures while being performant on a range of hardware. @@ -70,7 +50,7 @@ Figure 2.1 Conceptual Model of a Future High Performance Computing Node Kokkos Spaces ------------- -Kokkos uses the term *execution spaces* to describe a logical grouping of computation units which share an identical set of performance properties. An execution space provides a set of parallel execution resources which can be utilized by the programmer using several types of fundamental parallel operation. For a list of the operations available see |Chap7ParallelDispatch|_. The term *memory spaces* is used to describe a logical distinct memory resource, which is available to allocate data. +Kokkos uses the term *execution spaces* to describe a logical grouping of computation units which share an identical set of performance properties. An execution space provides a set of parallel execution resources which can be utilized by the programmer using several types of fundamental parallel operation. For a list of the operations available see :doc:`Chapter 6 - Parallel dispatch `. The term *memory spaces* is used to describe a logical distinct memory resource, which is available to allocate data. Execution Space Instances ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -112,7 +92,7 @@ Program execution It is tempting to try to define formally what it means for a processor to execute code. None of us authors have a background in logic or what computer scientists call "formal methods," so our attempt might not go very far! We will stick with informal definitions and rely on Kokkos' C++ implementation as an existence proof that the definitions make sense. -Kokkos lets users tell execution spaces to execute parallel operations. These include parallel for, reduce, and scan (see |Chap7ParallelDispatch|_) as well as |ViewAllocation|_ and |Initialization|_. We name the class of all such operations *parallel dispatch*. +Kokkos lets users tell execution spaces to execute parallel operations. These include parallel for, reduce, and scan (see :doc:`Chapter 6 - Parallel dispatch `) as well as :doc:`View allocation ` and :doc:`Initialization `. We name the class of all such operations *parallel dispatch*. From our perspective, there are three kinds of code: @@ -120,13 +100,15 @@ From our perspective, there are three kinds of code: #. Code outside of a Kokkos parallel operation that asks Kokkos to do something (e.g., parallel dispatch itself) #. Code that has nothing to do with Kokkos -The first category is the most restrictive. |Section82|_ explains restrictions on inter-team synchronization. In general, we limit the ability of Kokkos-parallel code to invoke Kokkos operations (other than for nested parallelism; see |Chap8HierarchicalParallelism|_ and especially |Section82|_). We also forbid dynamic memory allocation (other than from the team's scratch pad) in parallel operations. Whether Kokkos-parallel code may invoke operating system routines or third-party libraries depends on the execution and memory spaces being used. Regardless, restrictions on inter-team synchronization have implications for things like filesystem access. +The first category is the most restrictive. :ref:`Section 7.2 ` explains restrictions on inter-team synchronization. In general, we limit the ability of Kokkos-parallel code to invoke Kokkos operations (other than for nested parallelism; see :doc:`Chapter 7 - Hierarchical Parallelism ` and especially :ref:`Section 7.2 `). We also forbid dynamic memory allocation (other than from the team's scratch pad) in parallel operations. Whether Kokkos-parallel code may invoke operating system routines or third-party libraries depends on the execution and memory spaces being used. Regardless, restrictions on inter-team synchronization have implications for things like filesystem access. -*Kokkos threads are for computing in parallel*, not for overlapping I/O and computation, and not for making graphical user interfaces responsive. Use other kinds of threads (e.g., operating system threads) for the latter two purposes. You may be able to mix Kokkos' parallelism with other kinds of threads; see |Section231|_. Kokkos' developers are also working on a task parallelism model that will work with Kokkos' existing data-parallel constructs. +*Kokkos threads are for computing in parallel*, not for overlapping I/O and computation, and not for making graphical user interfaces responsive. Use other kinds of threads (e.g., operating system threads) for the latter two purposes. You may be able to mix Kokkos' parallelism with other kinds of threads; see :ref:`Section 2.3.1 `. Kokkos' developers are also working on a task parallelism model that will work with Kokkos' existing data-parallel constructs. **Reproducible reductions and scans** Kokkos promises *nothing* about the order in which the iterations of a parallel loop occur. However, it *does* promise that if you execute the same parallel reduction or scan, using the same hardware resources and run-time settings, then you will get the same results each time you run the operation. "Same results" even means "with respect to floating-point rounding error." -**Asynchronous parallel dispatch** This concerns the second category of code that calls Kokkos operations. In Kokkos, parallel dispatch executes *asynchronously*. This means that it may return "early," before it has actually completed. Nevertheless, it executes *in sequence* with respect to other Kokkos operations on the same execution or memory space. This matters for things like timing. For example, a |ParallelFor|_ may return "right away," so if you want to measure how long it takes, you must first call |Fence|_ on that execution space. This forces all functors to complete before |Fence|_ returns. +**Asynchronous parallel dispatch** This concerns the second category of code that calls Kokkos operations. In Kokkos, parallel dispatch executes *asynchronously*. This means that it may return "early," before it has actually completed. Nevertheless, it executes *in sequence* with respect to other Kokkos operations on the same execution or memory space. This matters for things like timing. For example, a |ParallelFor| may return "right away," so if you want to measure how long it takes, you must first call |Fence| on that execution space. This forces all functors to complete before |Fence| returns. + +.. _thread-safety: Thread safety? ~~~~~~~~~~~~~~ diff --git a/docs/source/ProgrammingGuide/ParallelDispatch.md b/docs/source/ProgrammingGuide/ParallelDispatch.md index 677e57054..b0c9f6434 100644 --- a/docs/source/ProgrammingGuide/ParallelDispatch.md +++ b/docs/source/ProgrammingGuide/ParallelDispatch.md @@ -18,7 +18,7 @@ Important notes on syntax: ### Functors -A _functor_ is one way to define the body of a parallel loop. It is a class or struct1 with a public `operator()` instance method. That method's arguments depend on both which parallel operation you want to execute (for, reduce, or scan), and on the loop's execution policy (e.g., range or team). For an example of a functor see the section in this chapter for each type of parallel operation. In the most common case of a [`parallel_for()`](../API/core/parallel-dispatch/parallel_for), it takes an integer argument which is the for loop's index. Other arguments are possible; see [Chapter 8 - Hierarchical Parallelism](HierarchicalParallelism). +A _functor_ is one way to define the body of a parallel loop. It is a class or struct1 with a public `operator()` instance method. That method's arguments depend on both which parallel operation you want to execute (for, reduce, or scan), and on the loop's execution policy (e.g., range or team). For an example of a functor see the section in this chapter for each type of parallel operation. In the most common case of a [`parallel_for()`](../API/core/parallel-dispatch/parallel_for), it takes an integer argument which is the for loop's index. Other arguments are possible; see [Chapter 7 - Hierarchical Parallelism](HierarchicalParallelism). The `operator()` method must be const, and must be marked with the `KOKKOS_FUNCTION` or `KOKKOS_INLINE_FUNCTION` macro. For some backends (such as CUDA and HIP) this macro is necessary to mark your method as suitable for running on both accelerator devices and the host. If not building with any backends requiring markup, `KOKKOS_INLINE_FUNCTION` expands to `inline`, and `KOKKOS_FUNCTION` is unnecessary but harmless. Here is an example of the signature of such a method: @@ -36,19 +36,19 @@ The entire parallel operation (for, reduce, or scan) shares the same instance of The 2011 version of the C++ standard ("C++11") provides a new language construct, the _lambda_, also called "anonymous function" or "closure." Kokkos lets users supply parallel loop bodies as either functors (see above) or lambdas. Lambdas work like automatically generated functors. Just like a class, a lambda may have state. The only difference is that with a lambda, the state comes in from the environment. (The name "closure" means that the function "closes over" state from the environment.) Just like with functors, lambdas must bring in state by "value" (copy), not by reference or pointer. -By default, lambdas capture nothing (as the default capture specifier `[]` specifies). This is not likely to be useful, since [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) generally works by its side effects. Because Kokkos reserves the right to make copies of the closure, and its operations are potentially asynchronous users must ``capture by value'' to be semantically correct. We recommend doing so via the KOKKOS_LAMBDA macro for the outermost level of parallelism (see [Chapter 8](HierarchicalParallelism)). +By default, lambdas capture nothing (as the default capture specifier `[]` specifies). This is not likely to be useful, since [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) generally works by its side effects. Because Kokkos reserves the right to make copies of the closure, and its operations are potentially asynchronous users must ``capture by value'' to be semantically correct. We recommend doing so via the KOKKOS_LAMBDA macro for the outermost level of parallelism (see [Chapter 7](HierarchicalParallelism)). For some backends, this just turns into the usual capture-by-value clause `[=]`. That captures variables from the surrounding scope by value. Do NOT capture them by reference! For other backends (e.g. CUDA and HIP), this macro may have a special definition that makes the lambda work correctly, same as the `KOKKOS_INLINE_FUNCTION` macro. It is a violation of Kokkos semantics to capture by reference `[&]` for two reasons. First Kokkos might give the lambda to an execution space which can not access the stack of the dispatching thread. Secondly, capturing by reference allows the programmer to violate the const semantics of the lambda. For correctness and portability reasons lambdas and functors are treated as const objects inside the parallel code section. Capturing by reference allows a circumvention of that const property, and enables many more possibilities of writing non-threads-safe code. -When using lambdas for nested parallelism (see [Chapter 8](HierarchicalParallelism) for details) using capture by reference can be useful for performance reasons, but the code is only valid Kokkos code if it also works with capturing by copy. +When using lambdas for nested parallelism (see [Chapter 7](HierarchicalParallelism) for details) using capture by reference can be useful for performance reasons, but the code is only valid Kokkos code if it also works with capturing by copy. ### Should I use a functor or a lambda? Kokkos lets users choose whether to use a functor or a lambda. Lambdas are convenient for short loop bodies. For a much more complicated loop body, you might find it easier for testing to separate it out and name it as a functor. Lambdas by definition are "anonymous functions," meaning that they have no name. This makes it harder to test them. Furthermore, if you would like to use lambdas with CUDA, you must have a sufficiently new version of CUDA. At the time of writing, CUDA 7.5 and later versions support host-device lambda with the special flag. CUDA 8.0 has improved interoperability with the host compiler. To enable this support, use the `KOKKOS_CUDA_OPTIONS=enable_lambda` option. -Finally, the "execution tag" feature, which lets you put together several different parallel loop bodies into a single functor, only works with functors. (See [Chapter 8](HierarchicalParallelism) for details.) +Finally, the "execution tag" feature, which lets you put together several different parallel loop bodies into a single functor, only works with functors. (See [Chapter 7](HierarchicalParallelism) for details.) ### Specifying the execution space diff --git a/docs/source/ProgrammingGuide/ProgrammingModel.md b/docs/source/ProgrammingGuide/ProgrammingModel.md index 7d97687aa..d84cced15 100644 --- a/docs/source/ProgrammingGuide/ProgrammingModel.md +++ b/docs/source/ProgrammingGuide/ProgrammingModel.md @@ -37,7 +37,7 @@ Threads in a team can synchronize - they have a "barrier" primitive - and share Scratch pad memory exists only during parallel operations; allocations in it do not persist across kernels. Teams themselves may run in any order, and may not necessarily run all in parallel. For example, if the user asks for _T_ teams, the hardware may choose to run them one after another in sequence, or in groups of up to _G_ teams at a time in parallel. -Users may _nest_ parallel operations. Teams may perform one parallel operation (for, reduce, or scan), and threads within each team may perform another, possibly different parallel operation. Different teams may do entirely different things. For example, all the threads in one team may execute a [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) and all the threads in a different team may execute a [`parallel_scan()`](../API/core/parallel-dispatch/parallel_scan). Different threads within a team may also do different things. However, performance may vary if threads in a team "diverge" in their behavior (e.g., take different sides of a branch). [Chapter 8 - Hierarchical Parallelism](HierarchicalParallelism) shows how the C++ implementation of Kokkos exposes thread teams. +Users may _nest_ parallel operations. Teams may perform one parallel operation (for, reduce, or scan), and threads within each team may perform another, possibly different parallel operation. Different teams may do entirely different things. For example, all the threads in one team may execute a [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) and all the threads in a different team may execute a [`parallel_scan()`](../API/core/parallel-dispatch/parallel_scan). Different threads within a team may also do different things. However, performance may vary if threads in a team "diverge" in their behavior (e.g., take different sides of a branch). [Chapter 7 - Hierarchical Parallelism](HierarchicalParallelism) shows how the C++ implementation of Kokkos exposes thread teams. NVIDIA's CUDA programming model inspired Kokkos' thread team model. The scratch pad memory corresponds with CUDA's per-team "shared memory." The "league/team" vocabulary comes from OpenMP 4.0 and has many aspects in common with our thread team model. We have found that programming to this model results in good performance, even on computer architectures that only implement parts of the full model. For example, most multicore processors in common use for high-performance computing lack "scratch pad" hardware. However, if users request a scratch pad size that fits comfortably in the largest cache shared by the threads in a team, programming as if a scratch pad exists forces users to address locality in their algorithms. This also reflects the common experience that rewriting a code for more restrictive hardware, then porting the code _back_ to conventional hardware, tends to improve performance relative to an unoptimized code. diff --git a/docs/source/ProgrammingGuide/View.rst b/docs/source/ProgrammingGuide/View.rst index 0991a9ce9..c7565d57f 100644 --- a/docs/source/ProgrammingGuide/View.rst +++ b/docs/source/ProgrammingGuide/View.rst @@ -586,6 +586,8 @@ A user is in most cases also allowed to obtain a pointer to a specific element v This is only valid if a Views reference type is an `lvalue`. That property can be queried statically at compile time from the view through its `reference_type_is_lvalue` member. +.. _kokkos-memory-access-traits: + Memory access traits -------------------- @@ -599,6 +601,8 @@ Another way to get optimized data accesses is to specify memory traits. These tr Kokkos::View > d; Kokkos::View > e; +.. _kokkos-unmanaged-view: + Unmanaged Views ~~~~~~~~~~~~~~~ @@ -608,7 +612,7 @@ Unmanaged Views It's always better to let Kokkos control memory allocation, but sometimes you don't have a choice. You might have to work with an application or an interface that returns a raw pointer, for example. Kokkos lets you wrap raw pointers in an *unmanaged View*. "Unmanaged" means that Kokkos does *neither* reference counting *nor* automatic deallocation for those Views. The following example shows how to create an unmanaged View of host memory. You may do this for CUDA device memory too, or indeed for memory allocated in any memory space, by specifying the View's execution or memory space accordingly. Note that the pointer to the allocation has to be provided to the constructor. -We would like to highlight that in Kokkos, Views are managed by default. In other words, if a View is not created as an unmanaged View, then it is managed, irrespective of other memory traits. Thus, an explicit memory trait for managed Views (with an alias called ``Kokkos::MemoryManaged``), has been deprecated in Kokkos 4.7. Since, it has no practical value. See the API reference on |MemoryTraits|_. +We would like to highlight that in Kokkos, Views are managed by default. In other words, if a View is not created as an unmanaged View, then it is managed, irrespective of other memory traits. Thus, an explicit memory trait for managed Views (with an alias called ``Kokkos::MemoryManaged``), has been deprecated in Kokkos 4.7. Since, it has no practical value. See the API reference on :doc:`memory traits <../API/core/view/memoryTraits>`. .. code-block:: c++ @@ -649,8 +653,8 @@ While `RandomAccess` is valid for other execution spaces, currently no specific .. |Atomic| replace:: Atomic -|Atomic|_ Access -~~~~~~~~~~~~~~~~ +:doc:`Atomic <../API/core/atomics>` Access +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The `Atomic` access trait lets you create a View of data such that every read or write to any entry uses an atomic update. Kokkos supports atomics for all data types independent of size. Restrictions are that you are diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index 6fe5396d1..0a392196a 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -22,7 +22,7 @@ License ^^^^^^^ Note that by contributing to Kokkos Core, you agree to the **Apache License 2.0 with LLVM Exception**. This allows your contributions to be used in -closed-source commercial contexts. See the `LICENSE `__ for +closed-source commercial contexts. See the :doc:`LICENSE ` for details. Authors retain copyright on their own contributions. Developer Certificate of Origin (DCO) diff --git a/docs/source/get-started/configuration-guide.rst b/docs/source/get-started/configuration-guide.rst index 01ed6319d..d6f1c07aa 100644 --- a/docs/source/get-started/configuration-guide.rst +++ b/docs/source/get-started/configuration-guide.rst @@ -235,7 +235,7 @@ Backend-specific options * Enable relocatable device code (RDC) for CUDA [#rdc_with_shared_libs]_ * ``OFF`` - * * ``Kokkos_ENABLE_CUDA_UVM`` :red:`[Deprecated since 4.0]` see `Transition to alternatives <../usecases/Moving_from_EnableUVM_to_SharedSpace.html>`_ + * * ``Kokkos_ENABLE_CUDA_UVM`` :red:`[Deprecated since 4.0]` see :doc:`Transition to alternatives <../usecases/Moving_from_EnableUVM_to_SharedSpace>` * Use unified memory (UM) by default for CUDA * ``OFF`` @@ -243,7 +243,7 @@ Backend-specific options * Use ``cudaMallocAsync`` (requires CUDA Toolkit version 11.2 or higher). This optimization may improve performance in applications with multiple CUDA streams per device, but it is known to be incompatible with MPI distributions built on older versions of UCX - and many Cray MPICH instances. See `known issues <../known-issues.html#cuda>`_. + and many Cray MPICH instances. See :ref:`known issues `. * (see below [#cuda_malloc_async]_) * * ``Kokkos_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS`` diff --git a/docs/source/get-started/package-managers.rst b/docs/source/get-started/package-managers.rst index 2b94b7519..32d8d9697 100644 --- a/docs/source/get-started/package-managers.rst +++ b/docs/source/get-started/package-managers.rst @@ -23,7 +23,7 @@ Spack is a popular package manager for HPC. Spack comes with installation recip The `Kokkos recipe webpage `_ summarizes the available versions of Kokkos and their options. -Most of the time, Spack Kokkos' variants follow the same options as the Kokkos `CMake options <./configuration-guide.html>`_. +Most of the time, Spack Kokkos' variants follow the same options as the Kokkos :doc:`CMake options <./configuration-guide>`. List of available variants can be found by running .. code-block:: diff --git a/docs/source/get-started/requirements.rst b/docs/source/get-started/requirements.rst index a54904213..bb52a4ba1 100644 --- a/docs/source/get-started/requirements.rst +++ b/docs/source/get-started/requirements.rst @@ -1,3 +1,5 @@ +.. include:: ../mydefs.rst + Requirements ############ @@ -177,7 +179,7 @@ Build system: ============= * CMake >= 3.16: required -* CMake >= 3.18: Fortran linkage. This does not affect most mixed Fortran/Kokkos builds. See `known build issues `_. +* CMake >= 3.18: Fortran linkage. This does not affect most mixed Fortran/Kokkos builds. See :ref:`known issues `. * CMake >= 3.21.1 for NVC++ Primary tested compiler are passing in release mode diff --git a/docs/source/known-issues.rst b/docs/source/known-issues.rst index 8e8c4a089..1a2fbb5fd 100644 --- a/docs/source/known-issues.rst +++ b/docs/source/known-issues.rst @@ -13,6 +13,8 @@ Therefore, the header `Kokkos_Core.hpp` is protected against these macros, meani Even though definitions inside `Kokkos_Core.hpp` are protected against the macros, code outside is not. Thus, it is on the user to deal with the macros being defined, either by defining `-DNOMINMAX` or `/DNOMINMAX` in the compile line (preferred) or by putting `()` around names that contain `min` or `max`. +.. _kokkos-known-issues-cuda: + CUDA ==== @@ -151,16 +153,21 @@ Mathematical functions } -.. _Compatibility: ./ProgrammingGuide/Compatibility.html - -.. |Compatibility| replace:: Kokkos compatibility guidelines - The using-directive ``using namespace Kokkos;`` is highly discouraged (see -|Compatibility|_) and will cause compilation errors in presence of unqualified +:doc:`Kokkos compatibility guidelines <./ProgrammingGuide/Compatibility>`) and will cause compilation errors in presence of unqualified calls to mathematical functions. Instead, prefer explicit qualification ``Kokkos::sqrt`` or an using-declaration ``using Kokkos::sqrt;`` at local scope. +.. _kokkos-known-issues-fortran: + +Fortran +======= + +- In a mixed C++/Fortran code, CMake will use the C++ linker by default. If you override this behavior and use Fortran as the link language, the link may break because Kokkos adds linker flags expecting the linker to be C++. Prior to CMake 3.18, Kokkos has no way of detecting in downstream projects that the linker was changed to Fortran. From CMake 3.18, Kokkos can use generator expressions to avoid adding flags when the linker is not C++. Note: Kokkos will not add any linker flags in this Fortran case. The user will be entirely on their own to add the appropriate linker flags. See `known build issues `_. + +.. _kokkos-known-issues-mathematical-constants: + Mathematical constants ====================== diff --git a/docs/source/testing-and-issue-tracking/Kokkos-Project-Planning.md b/docs/source/testing-and-issue-tracking/Kokkos-Project-Planning.md index 2f8aed671..ead2abfd5 100644 --- a/docs/source/testing-and-issue-tracking/Kokkos-Project-Planning.md +++ b/docs/source/testing-and-issue-tracking/Kokkos-Project-Planning.md @@ -148,7 +148,7 @@ It serves multiple purposes: Prioritization of items is recorded in the [Kokkos project plan](https://github.com/orgs/kokkos/projects/1) -Meeting notes are kept in a private repository: [internal repository](https://github.com/kokkos/internal-documents) +Meeting notes are kept in a public repository: [public repository](https://github.com/kokkos/development) Further issue prioritization happens at the developer meeting discussed below. @@ -164,7 +164,7 @@ If something needs to be referenceable longer term, then it needs to be discusse Private information may be hosted on the [internal repository](https://github.com/kokkos/internal-documents) but do not post NDA data on there. Kokkos developer meeting held once a week on Wednesdays 2pm ET / 12 pm MT / 18:00 UTC on Zoom. -The agenda is posted on the internal repository ahead of time (it can be found under the [`meeting-notes/`](https://github.com/kokkos/internal-documents/tree/master/meeting-notes/2023) directory). +The agenda is posted on the internal repository ahead of time (it can be found under the [`meeting-notes/`](https://github.com/kokkos/development/tree/main/meeting_notes) directory). Developers are allowed to edit the agenda and add topics or issues that they would like to be discussed at the meeting. ## Release Process diff --git a/docs/source/testing-and-issue-tracking/Testing-Processes.md b/docs/source/testing-and-issue-tracking/Testing-Processes.md index 73b6bb6f1..3c0ed14b2 100644 --- a/docs/source/testing-and-issue-tracking/Testing-Processes.md +++ b/docs/source/testing-and-issue-tracking/Testing-Processes.md @@ -28,7 +28,7 @@ the clang-format style specified in the repository. Test configurations are defined in the `kokkos/.jenkins`, and `kokkos/.github/workflows/*` files and determine the official primary software stack support. -The tested compiler versions are also listed [here](https://kokkos.github.io/kokkos-core-wiki/requirements.html). +The tested compiler versions are also listed [here](../get-started/requirements). These test configurations (sparsely) cover the cross product of hardware platforms (e.g. NVIDIA. Intel, and AMD), compilers (e.g. GCC, Clang, NVC++), C++ standards (17-23), Kokkos backends (e.g. Cuda, OpenMP, and HIP) and Kokkos configuration options (e.g. Debug, Relocatable Device Code). diff --git a/docs/source/usecases/MDRangePolicy.md b/docs/source/usecases/MDRangePolicy.md index 31693c49a..b8bd9a665 100644 --- a/docs/source/usecases/MDRangePolicy.md +++ b/docs/source/usecases/MDRangePolicy.md @@ -137,5 +137,4 @@ The API reference for the [`MDRangePolicy`](../API/core/policies/MDRangePolicy) The use case that this example is based on comes from the Intrepid2 package of Trilinos. For more examples, check out code in Trilinos in files at: `Trilinos/packages/intrepid2/src/Shared/Intrepid2_ArrayToolsDef*.hpp`. This link provides some overview of the Intrepid package: - [documentation link](https://trilinos.org/packages/intrepid/) - + [documentation link](https://trilinos.github.io/docs/intrepid2/index.html) diff --git a/docs/source/usecases/Moving_from_EnableUVM_to_SharedSpace.rst b/docs/source/usecases/Moving_from_EnableUVM_to_SharedSpace.rst index 14bba10c7..7222c32de 100644 --- a/docs/source/usecases/Moving_from_EnableUVM_to_SharedSpace.rst +++ b/docs/source/usecases/Moving_from_EnableUVM_to_SharedSpace.rst @@ -4,17 +4,11 @@ Moving code from requiring ``Kokkos_ENABLE_CUDA_UVM`` to using ``SharedSpace`` .. role:: cpp(code) :language: cpp -.. _SharedSpace: ../API/core/memory_spaces.html#kokkos-sharedspace -.. |SharedSpace| replace:: ``SharedSpace`` +.. |ExecutionSpace| replace:: :ref:`ExecutionSpace ` -.. _ExecutionSpace: ../API/core/execution_spaces.html#kokkos-executionspaceconcept -.. |ExecutionSpace| replace:: ``ExecutionSpace`` +.. |SharedHostPinnedSpace| replace:: :ref:`SharedHostPinnedSpace ` -.. _SharedHostPinnedSpace: ../API/core/memory_spaces.html#kokkos-sharedhostpinnedspace -.. |SharedHostPinnedSpace| replace:: ``SharedHostPinnedSpace`` - -.. _KokkosSharedSpace: ../API/core/memory_spaces.html#kokkos-sharedspace -.. |KokkosSharedSpace| replace:: ``Kokkos::SharedSpace`` +.. |KokkosSharedSpace| replace:: :ref:`SharedSpace ` With Kokkos 4.0 ``Kokkos_ENABLE_CUDA_UVM`` is deprecated and can only be used with ``Kokkos_ENABLE_DEPRECATED_CODE_4``. The main reason for the deprecation was, that using the option changed the ``memory_space`` of the ``Cuda`` ``ExecutionSpace``. This lead to several problems. For example: The driver is allowed to move chunks of this memory to the device or host depending on the access at any time without notice. The accesses in ``parallel_for``, ``parallel_reduce``, or ``parallel_scan`` do not occur in any guaranteed order and furthermore depend on other kernels running on the same GPU. This makes debugging tedious. Especially, if the memory an allocation resides in is not apparent but dependent on the options when running ``cmake``. @@ -22,14 +16,14 @@ The accesses in ``parallel_for``, ``parallel_reduce``, or ``parallel_scan`` do n The alternative --------------- -We introduced a new alias named |SharedSpace|_ in Kokkos 4.0. This always points to memory that is accessible by every |ExecutionSpace|_ and is migrated without user interaction to the accessing ``ExecutionSpace`` on demand. After migration the memory is accessed locally. +We introduced a new alias named |KokkosSharedSpace| in Kokkos 4.0. This always points to memory that is accessible by every |ExecutionSpace| and is migrated without user interaction to the accessing ``ExecutionSpace`` on demand. After migration the memory is accessed locally. Using the alias e.g. in ``Views`` is expressive and thus easier to read. Furthermore, it is portable to every backend that can automatically migrate memory between ``ExecutionSpaces``. -Furthermore, we introduced the alias |SharedHostPinnedSpace|_ which points to memory that is accessible by all enabled ``ExecutionSpaces`` but always resides in the memory of the host. +Furthermore, we introduced the alias |SharedHostPinnedSpace| which points to memory that is accessible by all enabled ``ExecutionSpaces`` but always resides in the memory of the host. The transition -------------- -Basically it comes down to spelling |KokkosSharedSpace|_ as a template argument in all allocations. +Basically it comes down to spelling ``Kokkos::SharedSpace`` as a template argument in all allocations. Below is an example of a transition: * Code requiring ``Kokkos_ENABLE_CUDA_UVM`` at configure time (until 4.0)