Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/API/core/Execution-Policies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Top Level Execution Policies
* * `TeamPolicy <policies/TeamPolicy.html>`__
* Assigns to each iterate in a contiguous range a team of threads

* * `SinglePolicy <policies/SinglePolicy.html>`__
* Scalar execution policy for executing a single instance of a kernel.

Nested Execution Policies
============================

Expand Down Expand Up @@ -101,6 +104,7 @@ Execution Policies generally accept compile time arguments via template paramete
./policies/MDRangePolicy
./policies/NestedPolicies
./policies/RangePolicy
./policies/SinglePolicy
./policies/TeamHandleConcept
./policies/TeamPolicy
./policies/TeamThreadMDRange
Expand Down
15 changes: 15 additions & 0 deletions docs/source/API/core/ParallelDispatch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,20 @@ Parallel execution patterns for composing algorithms.
* - `fence <parallel-dispatch/fence.html>`__
- Fences execution spaces

Special Dispatch
----------------

Restrict Execution to a certain resources

.. list-table::
:widths: 25 75
:header-rows: 1

* - Function
- Description
* - `single <parallel-dispatch/single.html>`__
- Executes a functor or lambda exactly once

Tags for Team Policy Calculations
---------------------------------

Expand All @@ -46,6 +60,7 @@ The following parallel pattern tags are used to call the correct overload for te
./parallel-dispatch/parallel_for
./parallel-dispatch/parallel_reduce
./parallel-dispatch/parallel_scan
./parallel-dispatch/single
./parallel-dispatch/fence
./parallel-dispatch/ParallelForTag
./parallel-dispatch/ParallelReduceTag
Expand Down
120 changes: 120 additions & 0 deletions docs/source/API/core/parallel-dispatch/single.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
``single``
==========

.. _SinglePolicy: ../policies/SinglePolicy.html

.. |SinglePolicy| replace:: ``SinglePolicy``

.. role::cpp(code)
:language: cpp

Header File: ``<Kokkos_Core.hpp>``

Usage
-----

.. code-block:: cpp

Kokkos::single(name, policy, functor);
Kokkos::single(name, policy, functor, output...);
Kokkos::single(policy, functor);
Kokkos::single(policy, functor, output...);
Kokkos::single(functor);
Kokkos::single(name, functor);

Execute functor on restricted resource defined by the policy

Interface
---------

.. code-block:: cpp

template <class ExecPolicy, class FunctorType>
Kokkos::single(const std::string& name,
const ExecPolicy& policy,
const FunctorType& functor);

.. code-block:: cpp

template <class ExecPolicy, class FunctorType, class ValueType>
Kokkos::single(const ExecPolicy& policy,
const FunctorType& functor,
ValueType& val);

.. code-block:: cpp

template <class ExecPolicy, class FunctorType>
Kokkos::single(const ExecPolicy& policy, const FunctorType& functor);

.. code-block:: cpp

template <class FunctorType>
Kokkos::single(const std::string& name, const FunctorType& functor);

.. code-block:: cpp

template <class FunctorType>
Kokkos::single(const FunctorType& functor);


Parameters:
~~~~~~~~~~~

* ``ExecPolicy``: defines execution properties, valid policies are:

- :cpp:func:`SinglePolicy` restricts execution to a single thread in the execution space.
- :cpp:func:`PerTeam` restricts execution to a single vector lane in the calling team.
- :cpp:func:`PerThread` restricts execution to a single vector lane in the calling thread.

* ``name`` is only usable with |SinglePolicy|_

* ``val`` is a reference to the output variable. The functor's ``operator()`` receives a reference to the reduction value.

Examples
--------

Using ``Kokkos::single`` in Hierarchical Parallelism

.. code-block:: cpp

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);

using team_policy = Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>;
using team_handle = team_policy::member_type;

Kokkos::parallel_for(team_policy(100, Kokkos::AUTO()),
KOKKOS_LAMBDA(const team_handle& team) {
int val;
Kokkos::single(Kokkos::PerTeam(team), [&]() {
val = team.league_rank();
});
});

Kokkos::finalize();
return 0;
}


Using ``Kokkos::single`` to execute a functor once and produce a single output

.. code-block:: cpp

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);

float value;
Kokkos::single("label", Kokkos::SinglePolicy(),
KOKKOS_LAMBDA(float& val) {
val = 1.0;
}, value);

Kokkos::finalize();
return 0;
}


156 changes: 156 additions & 0 deletions docs/source/API/core/policies/SinglePolicy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
``SinglePolicy``
================

.. _single: ../parallel-dispatch/single.html

.. |single| replace:: ``Kokkos::single``

.. role::cpp(code)
:language: cpp

Header File: ``<Kokkos_Core.hpp>``

Usage
-----

.. code-block:: cpp

Kokkos::SinglePolicy<...>()
Kokkos::SinglePolicy<...>(exec_space)

// CTAD Constructor
Kokkos::SinglePolicy()

``SinglePolicy`` defines an execution policy that executes a functor or lambda exactly once.
It has to be used with |single|_ to perform a single operation or to produce one
or more scalar outputs.

Synopsis
--------

.. code-block:: cpp

template<class ... Args>
struct Kokkos::SinglePolicy {
using execution_policy = SinglePolicy;

// Inherited from PolicyTraits<Args...>
using execution_space = PolicyTraits<Args...>::execution_space;
using work_tag = PolicyTraits<Args...>::work_tag;

using base_class = RangePolicy<Kokkos::LaunchBounds<1>, Args...>;

// Constructors
SinglePolicy(const SinglePolicy&) = default;
SinglePolicy(SinglePolicy&&) = default;

SinglePolicy();

SinglePolicy(const execution_space& exec_space);

// return ExecSpace instance provided to the constructor
KOKKOS_FUNCTION const execution_space & space() const;
};

Parameters
----------

General Template Arguments
~~~~~~~~~~~~~~~~~~~~~~~~~~

Valid template arguments for ``SinglePolicy`` are described `here <../Execution-Policies.html#common-arguments-for-all-execution-policies>`_

Public Class Members
--------------------

Constructors
~~~~~~~~~~~~

.. cpp:function:: SinglePolicy()

Default Constructor. Uses the default execution space.

.. cpp:function:: SinglePolicy(const ExecutionSpace& exec_space)

Constructor that takes an ``ExecutionSpace`` instance.

Examples
--------

Using ``SinglePolicy`` for a single operation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Execute a functor once with zero arguments. The functor can also be called with a WorkTag.

.. code-block:: cpp

struct Functor {
Kokkos::View<double> v;
KOKKOS_FUNCTION void operator()() const { v() *= 3; }
KOKKOS_FUNCTION void operator()(const TimesTwoTag) const { v() *= 2; }
};

Kokkos::View<double> v("v");
Functor f{v};

// Default execution space
Kokkos::single("label", Kokkos::SinglePolicy(), f);

// With an ExecutionSpace instance
Kokkos::DefaultExecutionSpace exec_space;
Kokkos::single("label",Kokkos::SinglePolicy(exec_space), f);

// With both a WorkTag and an ExecutionSpace
Kokkos::single("label",
Kokkos::SinglePolicy<TimesTwoTag, Kokkos::DefaultExecutionSpace>(), f);


Using ``SinglePolicy`` to produce scalar output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Produce a single scalar output by passing a reference to the output variable as an argument to the functor's ``operator()``.
The functor can also be called with a WorkTag. The functor's ``operator()`` receives a reference to the reduction value.

.. code-block:: cpp

struct ReductionFunctor {
KOKKOS_FUNCTION void operator()(int& res) const { res = 5; }
KOKKOS_FUNCTION void operator()(const TenTag, int& res) const { res = 10; }
};

int val;
ReductionFunctor f;

// Default execution space
Kokkos::single("label", Kokkos::SinglePolicy(), f, val);
// val == 5

// With a WorkTag and an ExecutionSpace
Kokkos::single("label",
Kokkos::SinglePolicy<Kokkos::DefaultExecutionSpace, TenTag>(), f, val);
// val == 10

// With a lambda
Kokkos::single("label",
Kokkos::SinglePolicy<Kokkos::DefaultExecutionSpace>(),
KOKKOS_LAMBDA(int& ret) { ret = 5; }, val);
// val == 5

Using ``SinglePolicy`` to produce multiple outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The functor's ``operator()`` can receive multiple reduction arguments.

.. code-block:: cpp

int val1, val2;

// With a lambda producing 2 outputs
Kokkos::single("label", Kokkos::SinglePolicy(),
KOKKOS_LAMBDA(int& s1, int& s2) {
s1 = 1;
s2 = 2;
}, val1, val2);
// val1 == 1
// val2 == 2

3 changes: 2 additions & 1 deletion docs/source/ProgrammingGuide/HierarchicalParallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,10 +316,11 @@ As the name indicates the vector-level must be vectorizable. The parallel patter

As stated above, a kernel is a parallel region with respect to threads (and vector lanes) within a team. This means that global memory accesses outside of the respective nested levels potentially have to be protected against repetitive execution. A common example is the case where a team performs some calculation but only one result per team has to be written back to global memory.

Kokkos provides the `Kokkos::single(Policy,Lambda)` function for this case. It currently accepts two policies:
Kokkos provides the [`Kokkos::single(Policy,Lambda)`](../API/core/parallel-dispatch/single.rst) function for this case. It currently accepts two policies:

* `Kokkos::PerTeam` restricts execution of the lambda's body to once per team
* `Kokkos::PerThread` restricts execution of the lambda's body to once per thread (that is, to only one vector lane in a thread)
* `Kokkos::SinglePolicy` Not nested, restricts execution to a single thread in the execution space.

The `single` function takes a lambda as its second argument. That lambda takes zero arguments or one argument by reference. If it takes no argument, its body must perform side effects in order to have an effect. If it takes one argument, the final value of that argument is broadcast to every executor on the level: i.e. every vector lane of the thread, or every thread (and vector lane) of the team. It must always be correct for the lambda to capture variables by value (`[=]`, not `[&]`). Thus, if the lambda captures by reference, it must _not_ modify variables that it has captured by reference.

Expand Down