Running a low resolution ocean model built with the Intel compiler “debugged” compilation options the following crash problem has been detected:
[gadi-cpu-clx-1929:327681:0:327681] Caught signal 8 (Floating point exception: floating-point invalid operation)
==== backtrace (tid: 327681) ====
0 0x0000000000012d10 funlockfile() :0
1 0x00000000007fda7e lbcnfd_mp_lbc_nfd_nogather_2d() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/LBC/lbcnfd.F90:1133
2 0x00000000006abddc lbclnk_mp_mpp_nfd_2d_ptr() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/LBC/lbclnk.F90:2758
3 0x00000000005e47eb lbclnk_mp_mpp_lnk_2d_ptr_() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/LBC/lbclnk.F90:867
4 0x00000000005ad54a lbclnk_mp_lbc_lnk_2d_multi_() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/LBC/lbclnk.F90:152
5 0x000000000316f756 geo2ocean_mp_angle_() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/SBC/geo2ocean.F90:299
6 0x000000000314dec2 geo2ocean_mp_rot_rep_() /scratch/dp9/iib548/NEMO/304_4.1.4.dbg/preprocess-ocean/src/nemo/src/OCE/SBC/geo2ocean.F90:96
. . .
pointing out to the following sources
1129 IF ( .NOT. l_fast_exchanges ) THEN
1130 DO jl = 1, ipl; DO jk = 1, ipk
1131 DO ji = 1, endloop
1132 iju = jpiglo - ji - nimpp - nfiimpp(isendto(1),jpnj) + 3
1133 ptab(ji,nlcj-1) = psgn * ptab2(iju,ijpjp1,jk,jl)
1134 END DO
1135 END DO; END DO
1136 ENDIF
in lbcnfd.F90 where the ptab2(iju,ijpjp1,jk,jl) term has not been defined on some grid points and PEs.
Tracing back this uninitialized issue in the following call sequence
angle => lbc_lnk_multi (lbc_lnk_2d_multi) => load_ptr_2d
=> lbc_lnk_ptr (mpp_lnk_2d_ptr) => mpp_nfd_2d_ptr => lbc_nfd_nogather_2d (routine of the crash problem, file lbcnfd.F90)
I found that the problem is caused by the following setting
fs_2=2
used in the angle routine (file geo2ocean.F90) where an allocatable array gcosf is declared as
gcosf(jpi,jpj)
and not all elements of this array are set in the corresponding loop of
DO jj = 2, jpjm1
DO ji = fs_2, jpi ! vector opt.
due to
#if defined key_vectopt_loop
define fs_2 1
define fs_jpim1 jpi
#else
define fs_2 2
define fs_jpim1 jpim1
#endif
settings from vectopt_loop_substitute.h90 but the whole array is used in the sources in some places.
Note that the building procedure passed to me for the purpose of optimising the NEMO sources does not use key_vectopt_loop pre-processor setting for compiling the model sources.
Just for curiosity I decided to check on what happens by using a pre-processor setting of key_vectopt_loop in the building procedure.
Unfortunately this change causes a different crash problem
forrtl: severe (408): fort: (2): Subscript #1 of the array TMASK has value 54 which is greater than the upper bound of 53
Image PC Routine Line Source
nemo-si3.exe 00000000042C360F Unknown Unknown Unknown
nemo-si3.exe 00000000017B062F dommsk_mp_dom_msk 212 dommsk.F90
nemo-si3.exe 0000000001724B5F domain_mp_dom_ini 149 domain.F90
nemo-si3.exe 000000000041EC2C nemogcm_mp_nemo_i 364 nemogcm.F90
nemo-si3.exe 000000000041BE99 nemogcm_mp_nemo_g 134 nemogcm.F90
nemo-si3.exe 000000000041BDA1 MAIN__ 18 nemo.f90
pointing out to the following sources
210 DO jj = 1, jpjm1
211 DO ji = 1, jpi ! vector loop
212 umask(ji,jj,jk) = tmask(ji,jj ,jk) * tmask(ji+1,jj ,jk)
213 vmask(ji,jj,jk) = tmask(ji,jj ,jk) * tmask(ji ,jj+1,jk)
214 END DO
Note, that the same uninitialized problem has been found running a high resolution model.
Could you please advise on what is the nature of the problem and how it can be fixed?
Thank you.