Quinoa uses charm’s collide library to do mesh-to-mesh solution transfers for overset meshes. That library does not implement checkpointing, therefore causing any checkpoint-restarted instance involving mesh-to-mesh transfers (i.e. overset meshes) to fail in quinoa. Early attempts to fix this are on the migratable-collidev701 branch (https://github.com/adityakpandare/charm/tree/migratable-collidev701). The restarted run deadlocks after the changes in charm on the migratable-collidev701 branch.
Here are the steps to get to the deadlocking and a stack-trace:
- build charm’s 'migratable-collidev701' branch using following command:
./buildold LIBS mpi-linux-x86_64 --enable-randomized-msgq --with-prio-type=int --enable-error-checking
- build quinoa-tpl using following cmake config:
cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> ../
- build quinoa's 'overset_migratablecollide' branch (https://github.com/quinoacomputing/quinoa/tree/overset_migratablecollide) with following cmake config:
cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> -DTPL_DIR=<tpl-installdir-path> ../src/
- Run quinoa (regular, non-restarted run):
../quinoa/build/debug-overset-restart/Main/inciter -c control_file.lua -v
- Next, attempt to restart above run as as:
../quinoa/build/debug-overset-restart/Main/inciter -c restart_control_file.lua +restart ./restart/ -v
This gives a deadlocking at the first instance where the collide library is invoked.
Fixing this issue will involve changes from Charm's collide library.
Quinoa uses charm’s collide library to do mesh-to-mesh solution transfers for overset meshes. That library does not implement checkpointing, therefore causing any checkpoint-restarted instance involving mesh-to-mesh transfers (i.e. overset meshes) to fail in quinoa. Early attempts to fix this are on the migratable-collidev701 branch (https://github.com/adityakpandare/charm/tree/migratable-collidev701). The restarted run deadlocks after the changes in charm on the migratable-collidev701 branch.
Here are the steps to get to the deadlocking and a stack-trace:
./buildold LIBS mpi-linux-x86_64 --enable-randomized-msgq --with-prio-type=int --enable-error-checkingcmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> ../cmake -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_BUILD_TYPE=Debug -DCHARM_ROOT=<path-where-charm-migratable-collidev701-installed> -DTPL_DIR=<tpl-installdir-path> ../src/../quinoa/build/debug-overset-restart/Main/inciter -c control_file.lua -v../quinoa/build/debug-overset-restart/Main/inciter -c restart_control_file.lua +restart ./restart/ -vThis gives a deadlocking at the first instance where the collide library is invoked.
Fixing this issue will involve changes from Charm's collide library.