-
Notifications
You must be signed in to change notification settings - Fork 124
Release briefing 4.6 #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+928
−1
Merged
Release briefing 4.6 #125
Changes from 41 commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
072463b
create empty slides for the release briefing 4/6
JBludau fd494c4
Update default target
masterleinad bfbff99
Add DualView changes
masterleinad 686d0c7
added slides for graph
JBludau 7d31172
added date of release briefing
JBludau 05afa88
Remove changes related to skipping for host-accessible memory spaces
masterleinad 95408b0
Add slide about kokkos_check
tpadioleau 2e08cc0
General enhancements: add inclusive_scan an kokkos tools overhead
tretre91 f5a6848
Update general enhancements
tretre91 b0ca7b3
Update Content/ReleaseBriefings/4_6/Section_Organizational.tex
JBludau 2fe15d8
Update Content/ReleaseBriefings/4_6/Section_Organizational.tex
JBludau 0b6816e
Update Content/ReleaseBriefings/release-46.tex
JBludau 2708b3b
Update Content/ReleaseBriefings/4_6/Section_Organizational.tex
JBludau 1b218ec
Update Content/ReleaseBriefings/4_6/Section_Organizational.tex
JBludau 8d740c6
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau 3ad305b
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau 7dc1c33
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau 0c30e82
Add HIP Multi-GPU slides
tcclevenger ba8f0b3
incorporated Romin's comments
JBludau 00a69c9
Update Content/ReleaseBriefings/release-46.tex
JBludau 2a081bd
move dualView slide to deprecation section
JBludau a486258
changed to ornlid
JBludau cb0b62f
add breaking changes/deprecations
nmm0 4ca7485
add 4.6 release briefing bugfix slides
nmm0 61e0e07
Add slides to general enhancements section
ldh4 ebfa2bd
Remove comments
ldh4 169386d
Add Build System updates
diehlpk 638b123
Add backend updates
diehlpk a42c9cf
adjusted style to match rest of the slides
JBludau 5a257f2
Apply suggestions to general enhancements
tretre91 426ebd2
removed section header slides
JBludau fbbb074
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau f2146e4
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau fe9f55b
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau 811701c
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau e51238d
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau ea9c4e5
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau 44b6879
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau 14927ea
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau 8d3e202
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau 7c182f7
Reword HIP multi-gpu bullet point
tcclevenger 1dbcacf
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau 9e0573b
Update Content/ReleaseBriefings/4_6/Section_GeneralEnhancements.tex
tpadioleau 965e021
Add performance numbers for inclusive scan
tretre91 3be527b
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau ac34214
Update Content/ReleaseBriefings/4_6/Section_GeneralEnhancements.tex
JBludau 9a044fd
Update Content/ReleaseBriefings/4_6/Section_BackendUpdates.tex
JBludau 503ab7e
Update Content/ReleaseBriefings/4_6/Section_BackendUpdates.tex
JBludau 01e3d19
Update Content/ReleaseBriefings/4_6/Section_BackendUpdates.tex
JBludau 898724c
Update Content/ReleaseBriefings/4_6/Section_BackendUpdates.tex
JBludau 59573be
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau 8130980
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau 53d065e
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau 0f26f7d
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau c6aadf7
Update Content/ReleaseBriefings/4_6/Section_BugFixes.tex
JBludau c78ca02
Update Content/ReleaseBriefings/4_6/Section_BreakingChanges.tex
JBludau c38530d
remove hyperrefs to prs
JBludau 2d37476
spell out signature of functor in then node
JBludau 7896362
remove impl call from slides
JBludau bc29a03
add hint about no guarantees for print format
JBludau a8b181b
add missing escape
JBludau 73385b2
make some space on print_config slide
JBludau c210bf9
Update Content/ReleaseBriefings/release-46.tex
JBludau c455d4c
Add a slide for spack and MI300A
cedricchevalier19 5ef6d7c
Update Content/ReleaseBriefings/release-46.tex
JBludau bc48a52
add KUG program to slides
JBludau befcc77
added bof and tea time slide
JBludau f2125d0
add scan perf results
JBludau 2fafc0a
added perf data for tooling launch overhead
JBludau c923ca3
shortened print_config output
JBludau 5439ed0
add perf results for search
JBludau ecb73e4
Update Content/ReleaseBriefings/4_6/Section_NewFeatures.tex
JBludau bc84851
Change wording in multi-GPU slide
tcclevenger 7c76215
Some changes to interoptibility of graphs
tcclevenger 284e0e3
Some changes to graph.then
tcclevenger c0d281d
put a suit on christian's plots
JBludau 6cef403
switch order of bullet points so H100 relevant changes are close to t…
JBludau 683b165
use default colors in the plots
JBludau a66044d
add a hint that the speedup is algorithm and hardware dependent
JBludau 7b97e82
mark makefles as deprecated
JBludau 23cb5ec
Fix error in code example
tcclevenger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| %========================================================================== | ||
|
|
||
| \begin{frame}[fragile] | ||
|
|
||
| {\Huge Backend Updates} | ||
|
|
||
| \vspace{10pt} | ||
|
|
||
| \end{frame} | ||
|
|
||
|
|
||
| %========================================================================== | ||
|
|
||
| % Examples | ||
|
|
||
| % note: always keep the [fragile] for your frames! | ||
|
|
||
| \begin{frame}[fragile]{CUDA, SYCL and Serial} | ||
| \begin{itemize} | ||
| \item CUDA: Improved performance for \texttt{Kokkos::parallel\_reduce} on H100 and newer by increasing launch bounds | ||
| \item SYCL: Improved sorting performance for non-contiguous views with \texttt{RandomAccessIterator} | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \item Serial: Reduce fences when using \texttt{Kokkos\_ENABLE\_ATOMICS\_BYPASS} | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
| \begin{frame}[fragile]{HIP} | ||
| \begin{itemize} | ||
| \item Change block size deduction to prefer smaller blocks/teams if possible | ||
| \item Allocate memory with stream ordered semantics (\emph{i.e.}\ use \texttt{hipMallocAsync}) as default | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \item Fix a segfault when a virtual function called inside a kernel requires too many registers | ||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
| %========================================================================== | ||
|
|
||
|
|
||
| %========================================================================== | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| %========================================================================== | ||
|
|
||
| \begin{frame}[fragile] | ||
|
|
||
| {\Huge Deprecations and other breaking changes} | ||
|
|
||
| \vspace{10pt} | ||
|
|
||
| \end{frame} | ||
|
|
||
|
|
||
| \begin{frame}[fragile]{Intel Classic Compiler} | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \begin{itemize} | ||
| \item Intel has long since deprecated Intel Classic (since around 2022), and removed from oneAPI 2024.0 release | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \item In order to focus on newer compilers, and reduce maintenance burden, we have \textbf{removed} support for Intel Classic (oneAPI Intel/icpx still supported of course!) | ||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
|
|
||
| \begin{frame}[fragile]{DualView changes} | ||
| \textbf{Deprecate} direct access to \texttt{d\_view} and \texttt{h\_view} | ||
| \begin{itemize} | ||
| \item Modifying the allocations in d\_view and h\_view directly is dangerous, especially if \texttt{modify} and \texttt{sync} are skipped | ||
| \item Use \texttt{view\_host()} and \texttt{view\_device()} instead | ||
| \item These two functions return by value with deprecated code enabled and by const reference otherwise. This might have perfomance implications if used extensively, e.g., in loop bounds. | ||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
|
|
||
| \begin{frame}[fragile]{Experimental SIMD changes} | ||
| \begin{itemize} | ||
| \item \texttt{native\_simd}, \texttt{native\_simd\_mask} \textbf{deprecated} to align with the C++26 standard | ||
| \item \textbf{Removed} Obtaining a reference from \texttt{*simd*::operator[]} to align with the C++26 Standard | ||
| \item \textbf{Changed} the return type of \texttt{Kokkos::Experimental::*simd*::operator==} and \texttt{operator!=} to return SIMD masks instead of \texttt{bool} | ||
|
JBludau marked this conversation as resolved.
Outdated
JBludau marked this conversation as resolved.
Outdated
|
||
| \begin{itemize} | ||
| \item If you want old behavior, use \texttt{all\_of(a == b)} | ||
| \end{itemize} | ||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
| \begin{frame}[fragile]{Additional Deprecations and Removals} | ||
| \begin{itemize} | ||
| \item Already discussed deprecating the Makefile | ||
| \item StaticCrsGraph is \textbf{moved} to Kokkos Kernels and \textbf{deprecated} in Core | ||
| \begin{itemize} | ||
| \item See \url{https://github.com/kokkos/kokkos-kernels/pull/2419} | ||
| \item Symbol is in Kernels under \texttt{KokkosSparse::StaticCrsGraph} | ||
| \end{itemize} | ||
| \end{itemize} | ||
| \end{frame} | ||
| %========================================================================== | ||
|
|
||
| % Examples | ||
|
|
||
| % note: always keep the [fragile] for your frames! | ||
|
|
||
| %\begin{frame}[fragile]{Example list} | ||
| % \begin{itemize} | ||
| % \item Item 1 | ||
| % \item Item 2 with some \texttt{code} | ||
| % \begin{itemize} | ||
| % \item Sub-item 2.1 | ||
| % \item Sub-item 2.2 | ||
| % \end{itemize} | ||
| % \end{itemize} | ||
| %\end{frame} | ||
|
|
||
| %\begin{frame}[fragile]{Example code} | ||
| % \begin{code}[keywords={std}] | ||
| % #include <iostream> | ||
| % | ||
| % int main() { | ||
| % std::cout << "hello world\n"; | ||
| % } | ||
| % \end{code} | ||
| %\end{frame} | ||
|
|
||
| %\begin{frame}[fragile]{Example table} | ||
| % \begin{center} | ||
| % \begin{tabular}{l|l} | ||
| % a & b \\\hline | ||
| % c & d | ||
| % \end{tabular} | ||
| % \end{center} | ||
| %\end{frame} | ||
|
|
||
| %========================================================================== | ||
|
|
||
|
|
||
| %========================================================================== | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| %========================================================================== | ||
|
|
||
|
|
||
| % Fix performance bug affecting atomic_fetch_{add,sub,min,max,and,or,xor} on integral types long and unsigned long with HIP #7816 | ||
| % Performance bug in RangePolicy: construct error message if and only if the precondition is violated #7809 | ||
| % Fix execution of ranges with more than 2B elements #7797 | ||
| % Fix clean target when embedding Kokkos in another project #7557 | ||
| % Build system: hint to ARCH_NATIVE if ARMv9 Grace arch is not explicitly supported by the compiler #7862 | ||
| % Fix Zen3 flag for NVHPC #7558 | ||
| % Use right arch for MI300A in makefiles #7786 | ||
| % graph: nodes must be stored by the graph #7619 | ||
| % Make sure lock arrays are on device before launching a graph #7685 | ||
| % Cuda: fix incorrect iteration in MDRangePolicy of rank > 4 for high iteration counts #7724 | ||
| % Cuda: ignore gcc assembler options in nvcc-wrapper #7492 | ||
|
|
||
| % simd: fix a bug in scalar min/max #7813 | ||
| % simd: fix a bug in non-masked reductions #7845 | ||
| % Fix compiling BasicView on MSVC #7751 | ||
|
|
||
|
|
||
| \begin{frame}[fragile] | ||
|
|
||
| {\Huge Bug Fixes} | ||
|
|
||
| \vspace{10pt} | ||
|
|
||
| \end{frame} | ||
|
|
||
| \begin{frame}[fragile]{General bug fixes} | ||
| \begin{itemize} | ||
| \item Fix execution of ranges with more than 2 billion elements | ||
| \item Graph: | ||
| \begin{itemize} | ||
| \item Fix graph node lifetime issues | ||
| \item FIx lock-based atomics failure when launching CUDA and HIP graphs | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \end{itemize} | ||
| \item CUDA backend: Fix incorrect iteration in MDRangePolicy of rank $> 4$ for high iteration counts | ||
| \item SIMD: | ||
| \begin{itemize} | ||
| \item fix a bug in scalar min/max | ||
| \item fix a bug in non-masked reductions | ||
| \end{itemize} | ||
| \item View: fix MSVC compilation | ||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
| \begin{frame}[fragile]{Build system fixes} | ||
| \begin{itemize} | ||
| \item Fix clean target when embedding Kokkos in another project | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \item Stop generation if ARMv9 Grace arch is not explicitly supported by the compiler when \texttt{KOKKOS\_ARCH\_ARMV9\_GRACE} is specified | ||
| \begin{itemize} | ||
| \item Can still try and configure with \texttt{ARCH\_NATIVE} | ||
| \end{itemize} | ||
| \item Fix Zen3 flag for NVHPC | ||
| \item Use right arch for MI300A in makefiles | ||
| \item (CUDA) ignore gcc assembler options in nvcc-wrapper | ||
|
JBludau marked this conversation as resolved.
Outdated
|
||
| \end{itemize} | ||
| \end{frame} | ||
|
|
||
| \begin{frame}[fragile]{Performance bugfixes} | ||
| \begin{itemize} | ||
| \item Fix performance bug affecting atomic\_fetch\_\{add,sub,min,max,and,or,xor\} on integral types long and unsigned long with HIP | ||
| \item Fix performance of \texttt{RangePolicy} where an error message is generated even if precondition not violated | ||
| \end{itemize} | ||
| \end{frame} | ||
| %========================================================================== | ||
|
|
||
| % Examples | ||
|
|
||
| % note: always keep the [fragile] for your frames! | ||
|
|
||
| %\begin{frame}[fragile]{Example list} | ||
| % \begin{itemize} | ||
| % \item Item 1 | ||
| % \item Item 2 with some \texttt{code} | ||
| % \begin{itemize} | ||
| % \item Sub-item 2.1 | ||
| % \item Sub-item 2.2 | ||
| % \end{itemize} | ||
| % \end{itemize} | ||
| %\end{frame} | ||
|
|
||
| %\begin{frame}[fragile]{Example code} | ||
| % \begin{code}[keywords={std}] | ||
| % #include <iostream> | ||
| % | ||
| % int main() { | ||
| % std::cout << "hello world\n"; | ||
| % } | ||
| % \end{code} | ||
| %\end{frame} | ||
|
|
||
| %\begin{frame}[fragile]{Example table} | ||
| % \begin{center} | ||
| % \begin{tabular}{l|l} | ||
| % a & b \\\hline | ||
| % c & d | ||
| % \end{tabular} | ||
| % \end{center} | ||
| %\end{frame} | ||
|
|
||
| %========================================================================== | ||
|
|
||
|
|
||
| %========================================================================== | ||
29 changes: 29 additions & 0 deletions
29
Content/ReleaseBriefings/4_6/Section_BuildSystemUpdates.tex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| %========================================================================== | ||
|
|
||
| \begin{frame}[fragile] | ||
|
|
||
| {\Huge Build Systems Updates} | ||
|
|
||
| \vspace{10pt} | ||
|
|
||
| \end{frame} | ||
|
|
||
| %========================================================================== | ||
|
|
||
| % Examples | ||
|
|
||
| % note: always keep the [fragile] for your frames! | ||
|
|
||
| \begin{frame}[fragile]{New build system features} | ||
| \begin{itemize} | ||
| \item Add support for Zen 4 AMD microarchitecture (\texttt{Kokkos\_ARCH\_ZEN4}) | ||
| \item Enable NVIDIA Grace architecture with NVHPC (\texttt{Kokkos\_ARCH\_ARMV9\_GRACE}) | ||
| \item Support static library builds via \texttt{CMAKE\_CUDA\_RUNTIME\_LIBRARY=static} when using CUDA as CMake language | ||
| \end{itemize} | ||
|
|
||
| \end{frame} | ||
|
|
||
| %========================================================================== | ||
|
|
||
|
|
||
| %========================================================================== |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.