NEMO v5.0-beta - AGRIF parallel sisters

Hi,

I tried using parallel sisters ín NEMOv5.0-beta applied to an AGRIF configuration with two nests at very different sizes. While without key_agrif_psisters the model starts, it doesn’t when key_agrif_psisters is activated. The job fails even before creating any ocean.oputput files with segmentation faults:

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x4)
==== backtrace (tid:   8803) ====
 0 0x000000000001fa93 ucs_debug_print_backtrace()  /dev/shm/swmanage/UCX/1.8.1/system-system/ucx-1.8.1/src/ucs/debug/debug.c:653
 1 0x0000000000012cf0 __funlockfile()  :0
 2 0x000000000062b136 sub_loop_agrif_get_proc_info_()  ???:0
 3 0x000000000062b0e1 agrif_get_proc_info_()  ???:0
 4 0x00000000005c71d8 agrif_mpp_mp_agrif_init_proclist_()  ???:0
 5 0x00000000005c73d4 agrif_mpp_mp_agrif_mpi_init_()  ???:0
 6 0x00000000008cace4 lib_mpp_mp_sub_loop_mpp_start_()  ???:0
 7 0x00000000008caa77 lib_mpp_mp_mpp_start_()  ???:0
 8 0x0000000000425804 nemogcm_mp_sub_loop_nemo_init_()  ???:0
 9 0x0000000000427541 nemogcm_mp_nemo_init_()  ???:0
10 0x0000000000424344 nemogcm_mp_sub_loop_nemo_gcm_()  ???:0
11 0x00000000004242ad nemogcm_mp_nemo_gcm_()  ???:0
12 0x00000000004240a4 MAIN__()  ???:0
13 0x0000000000424062 main()  ???:0
14 0x000000000003ad85 __libc_start_main()  ???:0
15 0x0000000000423f6e _start()  ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
nemo               0000000001978EDA  Unknown               Unknown  Unknown
libpthread-2.28.s  0000152F51542CF0  Unknown               Unknown  Unknown
nemo               000000000062B136  sub_loop_agrif_ge        2059  agrif_user.f90
nemo               000000000062B0E1  agrif_get_proc_in        2019  agrif_user.f90
nemo               00000000005C71D8  agrif_mpp_mp_agri         158  modmpp.f90
nemo               00000000005C73D4  agrif_mpp_mp_agri         101  modmpp.f90
nemo               00000000008CACE4  lib_mpp_mp_sub_lo         497  lib_mpp.f90
nemo               00000000008CAA77  lib_mpp_mp_mpp_st         449  lib_mpp.f90
nemo               0000000000425804  nemogcm_mp_sub_lo         468  nemogcm.f90
nemo               0000000000427541  nemogcm_mp_nemo_i         342  nemogcm.f90
nemo               0000000000424344  nemogcm_mp_sub_lo         254  nemogcm.f90
nemo               00000000004242AD  nemogcm_mp_nemo_g         189  nemogcm.f90
nemo               00000000004240A4  MAIN__                     45  nemo.f90
nemo               0000000000424062  Unknown               Unknown  Unknown
libc-2.28.so       0000152F50E23D85  __libc_start_main     Unknown  Unknown
nemo               0000000000423F6E  Unknown               Unknown  Unknown

Is there anything else (besides the cpp key) I need to define/change to activate parallel sisters?

I already tried using different numbers of CPUS. I get the same result, even when chosing a setup which should match the grid sizes, based on numbers from the lists of suggestions provided when activating ln_listonly.

Any ideas?

Thanks!
Franziska

Hi Franziska,

You might get some more useful information if you switch the agrif debug logicals to .true. . They are in ext/AGRIF/AGRIF_FILES/modtypes.F90. There is one specifically for parallel_sisters. I’ve not (yet) used it in NEMOv5, but I found them very helpful when setting up the IMMERSE configuration.

Adam

Thanks Adam!
Unfortunately the debugging switches don’t provide any further information.
I still get the same result. Also when using the AGRIF_DEMO configuration (reduced to two single nests).
Franziska