diff --git a/docs/source/API/core/Execution-Policies.rst b/docs/source/API/core/Execution-Policies.rst index 37620170e..2973eaf3f 100644 --- a/docs/source/API/core/Execution-Policies.rst +++ b/docs/source/API/core/Execution-Policies.rst @@ -23,6 +23,9 @@ Top Level Execution Policies * * `TeamPolicy `__ * Assigns to each iterate in a contiguous range a team of threads + * * `SinglePolicy `__ + * Scalar execution policy for executing a single instance of a kernel. + Nested Execution Policies ============================ @@ -101,6 +104,7 @@ Execution Policies generally accept compile time arguments via template paramete ./policies/MDRangePolicy ./policies/NestedPolicies ./policies/RangePolicy + ./policies/SinglePolicy ./policies/TeamHandleConcept ./policies/TeamPolicy ./policies/TeamThreadMDRange diff --git a/docs/source/API/core/ParallelDispatch.rst b/docs/source/API/core/ParallelDispatch.rst index f0cdf6df2..98b98034f 100644 --- a/docs/source/API/core/ParallelDispatch.rst +++ b/docs/source/API/core/ParallelDispatch.rst @@ -21,6 +21,20 @@ Parallel execution patterns for composing algorithms. * - `fence `__ - Fences execution spaces +Special Dispatch +---------------- + +Restrict Execution to a certain resources + +.. list-table:: + :widths: 25 75 + :header-rows: 1 + + * - Function + - Description + * - `single `__ + - Executes a functor or lambda exactly once + Tags for Team Policy Calculations --------------------------------- @@ -46,6 +60,7 @@ The following parallel pattern tags are used to call the correct overload for te ./parallel-dispatch/parallel_for ./parallel-dispatch/parallel_reduce ./parallel-dispatch/parallel_scan + ./parallel-dispatch/single ./parallel-dispatch/fence ./parallel-dispatch/ParallelForTag ./parallel-dispatch/ParallelReduceTag diff --git a/docs/source/API/core/parallel-dispatch/single.rst b/docs/source/API/core/parallel-dispatch/single.rst new file mode 100644 index 000000000..fd2a40f6f --- /dev/null +++ b/docs/source/API/core/parallel-dispatch/single.rst @@ -0,0 +1,120 @@ +``single`` +========== + +.. _SinglePolicy: ../policies/SinglePolicy.html + +.. |SinglePolicy| replace:: ``SinglePolicy`` + +.. role::cpp(code) + :language: cpp + +Header File: ```` + +Usage +----- + +.. code-block:: cpp + + Kokkos::single(name, policy, functor); + Kokkos::single(name, policy, functor, output...); + Kokkos::single(policy, functor); + Kokkos::single(policy, functor, output...); + Kokkos::single(functor); + Kokkos::single(name, functor); + +Execute functor on restricted resource defined by the policy + +Interface +--------- + +.. code-block:: cpp + + template + Kokkos::single(const std::string& name, + const ExecPolicy& policy, + const FunctorType& functor); + +.. code-block:: cpp + + template + Kokkos::single(const ExecPolicy& policy, + const FunctorType& functor, + ValueType& val); + +.. code-block:: cpp + + template + Kokkos::single(const ExecPolicy& policy, const FunctorType& functor); + +.. code-block:: cpp + + template + Kokkos::single(const std::string& name, const FunctorType& functor); + +.. code-block:: cpp + + template + Kokkos::single(const FunctorType& functor); + + +Parameters: +~~~~~~~~~~~ + +* ``ExecPolicy``: defines execution properties, valid policies are: + + - :cpp:func:`SinglePolicy` restricts execution to a single thread in the execution space. + - :cpp:func:`PerTeam` restricts execution to a single vector lane in the calling team. + - :cpp:func:`PerThread` restricts execution to a single vector lane in the calling thread. + +* ``name`` is only usable with |SinglePolicy|_ + +* ``val`` is a reference to the output variable. The functor's ``operator()`` receives a reference to the reduction value. + +Examples +-------- + +Using ``Kokkos::single`` in Hierarchical Parallelism + +.. code-block:: cpp + + #include + + int main(int argc, char* argv[]) { + Kokkos::initialize(argc,argv); + + using team_policy = Kokkos::TeamPolicy; + using team_handle = team_policy::member_type; + + Kokkos::parallel_for(team_policy(100, Kokkos::AUTO()), + KOKKOS_LAMBDA(const team_handle& team) { + int val; + Kokkos::single(Kokkos::PerTeam(team), [&]() { + val = team.league_rank(); + }); + }); + + Kokkos::finalize(); + return 0; + } + + +Using ``Kokkos::single`` to execute a functor once and produce a single output + +.. code-block:: cpp + + #include + + int main(int argc, char* argv[]) { + Kokkos::initialize(argc,argv); + + float value; + Kokkos::single("label", Kokkos::SinglePolicy(), + KOKKOS_LAMBDA(float& val) { + val = 1.0; + }, value); + + Kokkos::finalize(); + return 0; + } + + diff --git a/docs/source/API/core/policies/SinglePolicy.rst b/docs/source/API/core/policies/SinglePolicy.rst new file mode 100644 index 000000000..53a22dcfc --- /dev/null +++ b/docs/source/API/core/policies/SinglePolicy.rst @@ -0,0 +1,156 @@ +``SinglePolicy`` +================ + +.. _single: ../parallel-dispatch/single.html + +.. |single| replace:: ``Kokkos::single`` + +.. role::cpp(code) + :language: cpp + +Header File: ```` + +Usage +----- + +.. code-block:: cpp + + Kokkos::SinglePolicy<...>() + Kokkos::SinglePolicy<...>(exec_space) + + // CTAD Constructor + Kokkos::SinglePolicy() + +``SinglePolicy`` defines an execution policy that executes a functor or lambda exactly once. +It has to be used with |single|_ to perform a single operation or to produce one +or more scalar outputs. + +Synopsis +-------- + +.. code-block:: cpp + + template + struct Kokkos::SinglePolicy { + using execution_policy = SinglePolicy; + + // Inherited from PolicyTraits + using execution_space = PolicyTraits::execution_space; + using work_tag = PolicyTraits::work_tag; + + using base_class = RangePolicy, Args...>; + + // Constructors + SinglePolicy(const SinglePolicy&) = default; + SinglePolicy(SinglePolicy&&) = default; + + SinglePolicy(); + + SinglePolicy(const execution_space& exec_space); + + // return ExecSpace instance provided to the constructor + KOKKOS_FUNCTION const execution_space & space() const; + }; + +Parameters +---------- + +General Template Arguments +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Valid template arguments for ``SinglePolicy`` are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_ + +Public Class Members +-------------------- + +Constructors +~~~~~~~~~~~~ + +.. cpp:function:: SinglePolicy() + + Default Constructor. Uses the default execution space. + +.. cpp:function:: SinglePolicy(const ExecutionSpace& exec_space) + + Constructor that takes an ``ExecutionSpace`` instance. + +Examples +-------- + +Using ``SinglePolicy`` for a single operation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Execute a functor once with zero arguments. The functor can also be called with a WorkTag. + +.. code-block:: cpp + + struct Functor { + Kokkos::View v; + KOKKOS_FUNCTION void operator()() const { v() *= 3; } + KOKKOS_FUNCTION void operator()(const TimesTwoTag) const { v() *= 2; } + }; + + Kokkos::View v("v"); + Functor f{v}; + + // Default execution space + Kokkos::single("label", Kokkos::SinglePolicy(), f); + + // With an ExecutionSpace instance + Kokkos::DefaultExecutionSpace exec_space; + Kokkos::single("label",Kokkos::SinglePolicy(exec_space), f); + + // With both a WorkTag and an ExecutionSpace + Kokkos::single("label", + Kokkos::SinglePolicy(), f); + + +Using ``SinglePolicy`` to produce scalar output +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Produce a single scalar output by passing a reference to the output variable as an argument to the functor's ``operator()``. +The functor can also be called with a WorkTag. The functor's ``operator()`` receives a reference to the reduction value. + +.. code-block:: cpp + + struct ReductionFunctor { + KOKKOS_FUNCTION void operator()(int& res) const { res = 5; } + KOKKOS_FUNCTION void operator()(const TenTag, int& res) const { res = 10; } + }; + + int val; + ReductionFunctor f; + + // Default execution space + Kokkos::single("label", Kokkos::SinglePolicy(), f, val); + // val == 5 + + // With a WorkTag and an ExecutionSpace + Kokkos::single("label", + Kokkos::SinglePolicy(), f, val); + // val == 10 + + // With a lambda + Kokkos::single("label", + Kokkos::SinglePolicy(), + KOKKOS_LAMBDA(int& ret) { ret = 5; }, val); + // val == 5 + +Using ``SinglePolicy`` to produce multiple outputs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The functor's ``operator()`` can receive multiple reduction arguments. + +.. code-block:: cpp + + int val1, val2; + + // With a lambda producing 2 outputs + Kokkos::single("label", Kokkos::SinglePolicy(), + KOKKOS_LAMBDA(int& s1, int& s2) { + s1 = 1; + s2 = 2; + }, val1, val2); + // val1 == 1 + // val2 == 2 + diff --git a/docs/source/ProgrammingGuide/HierarchicalParallelism.md b/docs/source/ProgrammingGuide/HierarchicalParallelism.md index fe72ebe7f..6b794d89f 100644 --- a/docs/source/ProgrammingGuide/HierarchicalParallelism.md +++ b/docs/source/ProgrammingGuide/HierarchicalParallelism.md @@ -316,10 +316,11 @@ As the name indicates the vector-level must be vectorizable. The parallel patter As stated above, a kernel is a parallel region with respect to threads (and vector lanes) within a team. This means that global memory accesses outside of the respective nested levels potentially have to be protected against repetitive execution. A common example is the case where a team performs some calculation but only one result per team has to be written back to global memory. -Kokkos provides the `Kokkos::single(Policy,Lambda)` function for this case. It currently accepts two policies: +Kokkos provides the [`Kokkos::single(Policy,Lambda)`](../API/core/parallel-dispatch/single.rst) function for this case. It currently accepts two policies: * `Kokkos::PerTeam` restricts execution of the lambda's body to once per team * `Kokkos::PerThread` restricts execution of the lambda's body to once per thread (that is, to only one vector lane in a thread) +* `Kokkos::SinglePolicy` Not nested, restricts execution to a single thread in the execution space. The `single` function takes a lambda as its second argument. That lambda takes zero arguments or one argument by reference. If it takes no argument, its body must perform side effects in order to have an effect. If it takes one argument, the final value of that argument is broadcast to every executor on the level: i.e. every vector lane of the thread, or every thread (and vector lane) of the team. It must always be correct for the lambda to capture variables by value (`[=]`, not `[&]`). Thus, if the lambda captures by reference, it must _not_ modify variables that it has captured by reference.