Hi,
I tried using parallel sisters ín NEMOv5.0-beta applied to an AGRIF configuration with two nests at very different sizes. While without key_agrif_psisters
the model starts, it doesn’t when key_agrif_psisters
is activated. The job fails even before creating any ocean.oputput files with segmentation faults:
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x4)
==== backtrace (tid: 8803) ====
0 0x000000000001fa93 ucs_debug_print_backtrace() /dev/shm/swmanage/UCX/1.8.1/system-system/ucx-1.8.1/src/ucs/debug/debug.c:653
1 0x0000000000012cf0 __funlockfile() :0
2 0x000000000062b136 sub_loop_agrif_get_proc_info_() ???:0
3 0x000000000062b0e1 agrif_get_proc_info_() ???:0
4 0x00000000005c71d8 agrif_mpp_mp_agrif_init_proclist_() ???:0
5 0x00000000005c73d4 agrif_mpp_mp_agrif_mpi_init_() ???:0
6 0x00000000008cace4 lib_mpp_mp_sub_loop_mpp_start_() ???:0
7 0x00000000008caa77 lib_mpp_mp_mpp_start_() ???:0
8 0x0000000000425804 nemogcm_mp_sub_loop_nemo_init_() ???:0
9 0x0000000000427541 nemogcm_mp_nemo_init_() ???:0
10 0x0000000000424344 nemogcm_mp_sub_loop_nemo_gcm_() ???:0
11 0x00000000004242ad nemogcm_mp_nemo_gcm_() ???:0
12 0x00000000004240a4 MAIN__() ???:0
13 0x0000000000424062 main() ???:0
14 0x000000000003ad85 __libc_start_main() ???:0
15 0x0000000000423f6e _start() ???:0
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
nemo 0000000001978EDA Unknown Unknown Unknown
libpthread-2.28.s 0000152F51542CF0 Unknown Unknown Unknown
nemo 000000000062B136 sub_loop_agrif_ge 2059 agrif_user.f90
nemo 000000000062B0E1 agrif_get_proc_in 2019 agrif_user.f90
nemo 00000000005C71D8 agrif_mpp_mp_agri 158 modmpp.f90
nemo 00000000005C73D4 agrif_mpp_mp_agri 101 modmpp.f90
nemo 00000000008CACE4 lib_mpp_mp_sub_lo 497 lib_mpp.f90
nemo 00000000008CAA77 lib_mpp_mp_mpp_st 449 lib_mpp.f90
nemo 0000000000425804 nemogcm_mp_sub_lo 468 nemogcm.f90
nemo 0000000000427541 nemogcm_mp_nemo_i 342 nemogcm.f90
nemo 0000000000424344 nemogcm_mp_sub_lo 254 nemogcm.f90
nemo 00000000004242AD nemogcm_mp_nemo_g 189 nemogcm.f90
nemo 00000000004240A4 MAIN__ 45 nemo.f90
nemo 0000000000424062 Unknown Unknown Unknown
libc-2.28.so 0000152F50E23D85 __libc_start_main Unknown Unknown
nemo 0000000000423F6E Unknown Unknown Unknown
Is there anything else (besides the cpp key) I need to define/change to activate parallel sisters?
I already tried using different numbers of CPUS. I get the same result, even when chosing a setup which should match the grid sizes, based on numbers from the lists of suggestions provided when activating ln_listonly
.
Any ideas?
Thanks!
Franziska