diff --git a/docs/how-to/performance_guidelines.rst b/docs/how-to/performance_guidelines.rst index 33dbbb4af4..efe532e6b6 100644 --- a/docs/how-to/performance_guidelines.rst +++ b/docs/how-to/performance_guidelines.rst @@ -262,7 +262,9 @@ For example, when the control condition depends on ``threadIdx`` or ``warpSize`` warp doesn't diverge. The compiler might optimize loops, short ifs, or switch blocks using branch predication, which prevents warp divergence. With branch predication, instructions associated with a false predicate are scheduled but -not executed, which avoids unnecessary operations. +not executed, which avoids unnecessary operations. For control conditions where +one outcome is significantly more likely than the other, use `__builtin_expect `_ +or ``[[likely]]`` to indicate the likely condition result. Avoiding divergent warps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^