[v4.2.x] NEMO-OASIS-WRF fails with infinite SSH by increasing MPI tasks used by NEMO

jkarag · 29 June 2022 20:08

Hi all,

I’m trying to use NEMO v4.2.0 in coupled mode with WRF v4.3.3 through the oasis3-mct_4.0 coupler.
The coupled model runs without errors when I use, for example, 120 MPI tasks for WRF and 20 or 40 tasks for NEMO. However, when I increase the tasks used in NEMO (over the 40) I get an output.abort message with infinity ssh.
What could be the possible cause of this problem ?

I would appreciate any suggestions!
Thanks in advance,
John

smasson · 4 July 2022 07:43

It is very difficult to give you an input with so few information. I would start by recompiling NEMO with debugging options for example:
%FCFLAGS -i4 -r8 -g -O0 -debug all -traceback -fp-model strict -ftrapuv -check all,noarg_temp_created -fpe-all0 -ftz -init=arrays,snan,huge
With this options you should know where in the code you have this infinite value.

jkarag · 7 July 2022 12:09

Thank you very much for your reply.

I tried to run NEMO with debugging option an I get the following error. It seems strange that the error only exists when increasing processors above 40 in coupled mode. In standalone mode, with the same BDY forcing, there is no problem.

Thanks,
John

forrtl: severe (408): fort: (2): Subscript #2 of the array PDTA_READ has value 13 which is greater than the upper bound of 12

Image              PC                Routine            Line        Source
nemo.exe           00000000043A6400  Unknown               Unknown  Unknown
nemo.exe           0000000001D25B52  fldread_mp_fld_ma         541  fldread.f90
nemo.exe           0000000001D248F0  fldread_mp_fld_ma         503  fldread.f90
nemo.exe           0000000001D1ECEA  fldread_mp_fld_ge         389  fldread.f90
nemo.exe           0000000001D1BB9B  fldread_mp_fld_up         313  fldread.f90
nemo.exe           0000000001D132AF  fldread_mp_fld_re         216  fldread.f90
nemo.exe           00000000011228A4  bdydta_mp_bdy_dta         197  bdydta.f90
nemo.exe           00000000005E2F5D  stpmlf_mp_stp_mlf         150  stpmlf.f90
nemo.exe           000000000045C341  nemogcm_mp_nemo_g         134  nemogcm.f90
nemo.exe           000000000045C049  MAIN__                     18  nemo.f90
nemo.exe           00000000044268B2  Unknown               Unknown  Unknown
libc-2.12.so       00002AC222D40D20  __libc_start_main     Unknown  Unknown
nemo.exe           000000000045BEA9  Unknown               Unknown  Unknown

atb299 · 8 July 2022 08:21

Just a thought, but might this be a result of trying to exchange missing tiles, land tiles (tiles with no ocean)? At 40 MPI tasks it is possible you have none of these, but above that you will begin to get an increasing number of these.

smasson · 8 July 2022 10:06

The traceback shows there is a problem somewhere (possibly related to the land MPI subdomains as mentioned in the previous comment) in the BDY routines which creates an error when reading the BDY data.
The point I don’t get is the link with the coupled mode. You say that in standalone mode you have no error. But do you use the exact same number of MPI processes for NEMO in coupled and standalone mode? To me you should have the same problem in coupled or standalone mode…

jkarag · 8 July 2022 12:46

I found what was wrong! In the namelist_cfg (&namsbc_cpl) the received stress had cartesian vector reference instead of spherical…

The above traceback related with the BDY routines exist in both standalone and coupled mode with the same number of MPI processes AND debugging options. When I set ln_bdy = .false. in namelist_cfg, I got an error in the routine geo2ocean.f90 and that’s how I found my mistake. Ιn any case I will have to look into this error.

Thanks for the support!

Topic		Replies	Views
Junk domain input at high processor number v4.2.x XIOS , OASIS	10	318	5 March 2024
[v4.2.x][ifort] zpshde.f90: internal compiler error v4.2.x TRA , deps	2	668	18 October 2022
[v4.2.x] `namcouple variable not used` error (NEMO-WRF-OASIS with nested domain) v4.2.x AGRIF , OASIS	3	426	29 August 2022
[v3.6] OASIS error with coupled NEMO-WW3: `global root invalid, check couplcomm for active tasks` v3.6 OASIS	0	282	7 April 2022
[v4.0.x] p4zopt.f90 crash: out of lower bound value for `rkrgb` v4.0.x P4Z , PISCES	3	363	12 May 2022

[v4.2.x] NEMO-OASIS-WRF fails with infinite SSH by increasing MPI tasks used by NEMO

Related Topics