Abort due to unrealistic or NaN value for SSH, velocity or salinity

Dear all,

I have a question. Usually when you get a crash with Nemo, it tells you that the velocity is crazy or the salinity below zero somewhere. But since the test is done at ever time step, which is also every baroclinic time step if you use time splitting, then it is not done at every barotropic time step. So sometimes your model crashes because the final values of the barotropic mode are NaNs, which makes everything is just NaN in the abort files, and the location of the crash does not make any sense. So there is no clue of why it crashed, and where.
Before I start coding something to modify this, I was just wondering if someone had written a fix for this.

Thanks in advance,

Robinson

Salut Robinson, I did, a long time ago… basically adding a debug option to check on the minimum value of ht, hv and hu on all tiles at each barotropic step and stopping if one of them goes negative…

Bonjour Frédéric,

Thanks, I will try…

Bonne journée,

Dear Ribinson,

Hello.
I also got stuck in the same error.
In my case, the salinity values of the crashed area are NANs.
When I encountered the instability error, I decreased the time interval of model.
The model sometimes worked, but now it doesn’t work…
I don’t know how to solve this problem.
Any suggestions and advices would be appreciated.
Please let me know if you have any comments.

Thank you in advance.

Best regards,

Hwa Jin Choi

Hei Jin,

But I guess the SSH is also NaN and basically everything right ?
Which area are you working on ? Try to increasing the frequency of atmospheric forcing (nn_fsbc), like if it is 4 then put it to 2 or even 1.

Hope this helps,

Robinson

I would try to see of the problem is really coming from the time-splitting by using the explicit formulation.

   ln_dynspg_exp  =  .true.   ! explicit free surface
   ln_dynspg_ts   = .false.   ! split-explicit free surface

Of course, the explicit formulation requires that you divide your time-step (rn_Dt) by the number of subtime-steps (usually ~50) you where using with the time-splitting formulation…

Hello Robinson,

Thank you for your advices.
You’re right. The SSH has also strange values.
I 'm running the model in global domain.
As you advise, I changed the frequency of atmospheric forcing (nn_fsbc), like 3 or 2.
But, it still doesn’t work. In similar time step, it show the same error message I attached below.
If you have any comments, please let me know :slight_smile:
Thank you very much.

  ===>>> : E R R O R

          ===========

 STOP
   stp_ctl: |ssh| > 20 m  or  |U| > 10 m/s  or  S <= 0  or  S >= 100  or  NaN encounter in the tests

 kt 350 |ssh| max   0.000     at i j   174 106    MPI rank 52
 kt 350 |U|   max   0.000     at i j k 174 106  1 MPI rank 52
 kt 350 Sal   min         NaN at i j k 163 106  1 MPI rank 52
 kt 350 Sal   max         NaN at i j k 163 106  1 MPI rank 52

        ===> output of last computed fields in output.abort* files


 huge E-R-R-O-R : immediate stop

Best regards,

Hwa-Jin Choi

Hei,

Did you try to reduce the model time step ?

Hello smasson,

Thanks to your comments.
In my namelist_cfg, surface pressure gradient was set as follows.

!-----------------------------------------------------------------------
&namdyn_spg    !   surface pressure gradient                            (default: NO selection)
!-----------------------------------------------------------------------
   ln_dynspg_ts  = .true.  !  split-explicit free surface

As you advise, I changed the formulations.
I checked that there was slightly different errors than before.
If you have any advice to add, please let me know.
I appreciate your help. :grinning:

ocean.output
 ~~~~~~~~~~~~~~  
 stp_ctl : time-stepping control
 ~~~~~~~
      file   : time.step open ok
      unit   =           24
      status = REPLACE
      form   = FORMATTED
      access = SEQUENTIAL

                     iom_close ~~~ close file: ./restarts/chl_clim_00008640_restart.nc ok
   fld_read: var calving kt =        2 ( 120.0938 days), Y/M/D = 2012/05/01, record:   0001 (days    0.0000 <->  365.0000)
   fld_read: var chlor_a kt =        2 ( 120.0938 days), Y/M/D = 2012/05/01, records b/a:   0001/  0001 (days  105.0000/ 135.5000)

  ==>>>   Look for "E R R O R" messages in all existing *ocean.output* files

  ==>>>   Look for "E R R O R" messages in all existing *ocean.output* files
ocean.output_0051
 ~~~~~~~~~~~~~~  
  ===>>> : E R R O R

          ===========

 STOP
   stp_ctl: |ssh| > 20 m  or  |U| > 10 m/s  or  S <= 0  or  S >= 100  or  NaN encounter in the tests

 kt 3 |ssh| max   64.62     at i j   150 106    MPI rank 51
 kt 3 |U|   max  0.8098     at i j k 150 106 21 MPI rank 51
 kt 3 Sal   min   4.924     at i j k 148 126  1 MPI rank 51
 kt 3 Sal   max   38.51     at i j k 158 106 26 MPI rank 51

        ===> output of last computed fields in output.abort* files


 huge E-R-R-O-R : immediate stop

Best regards,

Hwa-Jin Choi

Hello Robinson,

Yes. It shows the same error on the same day. :smiling_face_with_tear:
Thanks

Hwa-Jin Choi

At least now you have more details on the error…

  1. the error starts with extreme values on ssh. → did you tried to further reduce the model time-step rn_Dt? Which configuration are you using? global/regional? with/without tides? with/without sea-ice?
  2. you have the location of the error: (150, 106) where is it on your grid? Do you have something specific around this point? like a strait, strange coastline, strange forcing, sea-ice border or a runoff? It could be a runoff error has I see you also have very low SSS (4.9 psu) in this area. Which options did you use for the runoff (if you have some)?

Hello smasson,

First, I tried reducing the model time-step (rn_rdt) to 600s, but it didn’t work.
I have used OCE ICE NST and OCE TOP ICE NST configurations, global domain and no tidal forcing (ln_tide = .false.).

Second, the region where the error occurred are different as ln_dynspg_ts is turned on or off.
I have run 3 different experiments (when ln_dynspg_ts set .true.), all three experiments seem to have errors in the same area ( lon:15E-80E, lat:40N-59N, I checked this grid point in output.abort_00??.nc. Is this correct…?). This area includes the Black sea and the Caspian sea. Is there a problem here? I attached the namelist related to runoff below.

Actually, I’m just interested in equatorial region. So… if you have any ideas how to pass this error, please let me know…
Thank you very much. :smile:

Hwa-Jin Choi

runoff parameters
 ln_rnf      = .true.    !  runoffs  
!-----------------------------------------------------------------------
&namsbc_rnf    !   runoffs                                              (ln_rnf =T)
!-----------------------------------------------------------------------
   ln_rnf_mouth = .true.    !  specific treatment at rivers mouths
      rn_hrnf     =  15.e0    !  depth over which enhanced vertical mixing is used    (ln_rnf_mouth=T)
      rn_avt_rnf  =   1.e-3   !  value of the additional vertical mixing coef. [m2/s] (ln_rnf_mouth=T)
   rn_rfact    =   1.e0    !  multiplicative factor for runoff
   cn_dir = './'  !  root directory for the location of the runoff files
   !___________!_____________!___________________!___________!_____________!_________!___________!__________!__________!_______________!
   !           !  file name  ! frequency (hours) ! variable  ! time interp. !  clim  ! 'yearly'/ ! weights  ! rotation ! land/sea mask !
   !           !             !  (if <0  months)  !   name    !   (logical)  !  (T/F) ! 'monthly' ! filename ! pairing  ! filename      !
   sn_rnf      = 'runoff_core_monthly',    -1.   , 'sorunoff',   .true.     , .true. , 'yearly'  , ''       , ''       , ''
   sn_cnf      = 'runoff_core_monthly',     0.   , 'socoefr0',   .false.    , .true. , 'yearly'  , ''       , ''       , ''
   sn_s_rnf    = 'runoffs'            ,    24.   , 'rosaline',   .true.     , .true. , 'yearly'  , ''       , ''       , ''
   sn_t_rnf    = 'runoffs'            ,    24.   , 'rotemper',   .true.     , .true. , 'yearly'  , ''       , ''       , ''
   sn_dep_rnf  = 'runoffs'            ,     0.   , 'rodepth' ,   .false.    , .true. , 'yearly'  , ''       , ''       , ''
/

OCE ICE NST and OCE TOP ICE NST is not the configuration, it is defining which source code you are using when compiling the code.
The configuration is defined by your namelist*cfg files and the input files (typically ORCA_R2_zps_domcfg.nc)
If you are using a global configuration, I guess it is ORCA2 with an original time step of

   rn_Dt       = 5400.     !  time step for the dynamics and tracer

with ln_dynspg_exp = .true. you must divite this time step by about 30.
To make sure, I would try

   rn_Dt       = 100.     !  time step for the dynamics and tracer

When ln_dynspg_ts set to .true., the model error did already propagate to the whole domain during the sub-time steps done in dynspg_ts (basically you will have NaN everywhere). The error information you get if this case are not useful. That’s why you have to track the error with ln_dynspg_exp set to .true..

Hello smasson,

Thank you so much for letting me know in detail.
I have used ORCA2_ICE_PISCES configuration.
As you advise, I set ln_dynspg_exp to .true. and rn_rdt to 100.
An error occurred at the same time as before, I think it’s related iceberg issue…
I tested changing the iceberg restart file to another file, but the same error occurs at the same point.

At line 242 of file
/media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90
Fortran runtime error: Index '-2147483648' of dimension 1 of array 'mi1' below lower bound of 1

Error termination. Backtrace:
At line 242 of file /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90
Fortran runtime error: Index '-2147483648' of dimension 1 of array 'mi1' below lower bound of 1

Error termination. Backtrace:
At line 242 of file /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90
Fortran runtime error: Index '-2147483648' of dimension 1 of array 'mi1' below lower bound of 1

Error termination. Backtrace:
#0  0x153d6d12dd21 in ???
#1  0x153d6d12e869 in ???
#2  0x153d6d12eee6 in ???
#0  0x154af2237d21 in ???
#1  0x154af2238869 in ???
#2  0x154af2238ee6 in ???
#3  0x555ae6e8864c in icb_ground
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90:242
#4  0x555ae6e88cc7 in __icbdyn_MOD_icb_dyn
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90:115
#3  0x55fe02e2064c in icb_ground
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90:242
#4  0x55fe02e20cc7 in __icbdyn_MOD_icb_dyn
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbdyn.f90:115
#5  0x555ae64b8c5f in __icbstp_MOD_icb_stp
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbstp.f90:148
#5  0x55fe02450c5f in __icbstp_MOD_icb_stp
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/icbstp.f90:148
#0  0x149800982d21 in ???
#6  0x555ae5b09842 in __sbcmod_MOD_sbc
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/sbcmod.f90:484
#6  0x55fe01aa1842 in __sbcmod_MOD_sbc
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/sbcmod.f90:484
#7  0x55fe01b21c2c in __step_MOD_stp
	at /media/cmlws/Data1/hjc/NEMO/r4.0.6/cfgs/TEST5/BLD/ppsrc/nemo/step.f90:138
icbdyn.f90
SUBROUTINE icb_ground( pi, pi0, pu,   &
      &                   pj, pj0, pv, ld_bounced )
      !!----------------------------------------------------------------------
      !!                  ***  ROUTINE icb_ground  ***
      !!
      !! ** Purpose :   iceberg grounding.
      !!
      !! ** Method  : - adjust velocity and then put iceberg back to start position
      !!                NB two possibilities available one of which is hard-coded here
      !!----------------------------------------------------------------------
      REAL(wp), INTENT(inout) ::   pi , pj      ! current iceberg position
      REAL(wp), INTENT(in   ) ::   pi0, pj0     ! previous iceberg position
      REAL(wp), INTENT(inout) ::   pu  , pv     ! current iceberg velocities
      LOGICAL , INTENT(  out) ::   ld_bounced   ! bounced indicator
      !
      INTEGER  ::   ii, ii0
      INTEGER  ::   ij, ij0
      INTEGER  ::   ibounce_method
      !!----------------------------------------------------------------------
      !
      ld_bounced = .FALSE.
      !
      ii0 = INT( pi0+0.5 )   ;   ij0 = INT( pj0+0.5 )       ! initial gridpoint position (T-cell)
      ii  = INT( pi +0.5 )   ;   ij  = INT( pj +0.5 )       ! current     -         -
      !
      IF( ii == ii0  .AND.  ij == ij0  )   RETURN           ! berg remains in the same cell
      !
      ! map into current processor
      ii0 = mi1( ii0 )
      ij0 = mj1( ij0 )
      ii  = mi1( ii  )   !! <<<<<<<<<<<<<<<<<<<<<<<   line 242
      ij  = mj1( ij  )
      !
      IF(  tmask(ii,ij,1)  /=   0._wp  )   RETURN           ! berg reach a new t-cell, but an ocean one
      !
      ! From here, berg have reach land: treat grounding/bouncing
      ! -------------------------------
      ld_bounced = .TRUE.

I think the current iceberg position (pi) has NaN value.
I found the part where icb_ground was used in the code below, is it because of the time step (zdt)?

icb_dyn
   SUBROUTINE icb_dyn( kt )
      !!----------------------------------------------------------------------
      !!                  ***  ROUTINE icb_dyn  ***
      !!
      !! ** Purpose :   iceberg evolution.
      !!
      !! ** Method  : - See Martin & Adcroft, Ocean Modelling 34, 2010
      !!----------------------------------------------------------------------
      INTEGER, INTENT(in) ::   kt   !
      !
      LOGICAL  ::   ll_bounced
      REAL(wp) ::   zuvel1 , zvvel1 , zu1, zv1, zax1, zay1, zxi1 , zyj1
      REAL(wp) ::   zuvel2 , zvvel2 , zu2, zv2, zax2, zay2, zxi2 , zyj2
      REAL(wp) ::   zuvel3 , zvvel3 , zu3, zv3, zax3, zay3, zxi3 , zyj3
      REAL(wp) ::   zuvel4 , zvvel4 , zu4, zv4, zax4, zay4, zxi4 , zyj4
      REAL(wp) ::   zuvel_n, zvvel_n, zxi_n   , zyj_n
      REAL(wp) ::   zdt, zdt_2, zdt_6, ze1, ze2
      TYPE(iceberg), POINTER ::   berg
      TYPE(point)  , POINTER ::   pt
      !!----------------------------------------------------------------------
      !
      ! 4th order Runge-Kutta to solve:   d/dt X = V,  d/dt V = A
      !                    with I.C.'s:   X=X1 and V=V1
      !
      !                                    ; A1=A(X1,V1)
      !  X2 = X1+dt/2*V1 ; V2 = V1+dt/2*A1 ; A2=A(X2,V2)
      !  X3 = X1+dt/2*V2 ; V3 = V1+dt/2*A2 ; A3=A(X3,V3)
      !  X4 = X1+  dt*V3 ; V4 = V1+  dt*A3 ; A4=A(X4,V4)
      !
      !  Xn = X1+dt*(V1+2*V2+2*V3+V4)/6
      !  Vn = V1+dt*(A1+2*A2+2*A3+A4)/6

      ! time steps
      zdt   = berg_dt
      zdt_2 = zdt * 0.5_wp
      zdt_6 = zdt / 6._wp

      berg => first_berg                    ! start from the first berg
      !
      DO WHILE ( ASSOCIATED(berg) )          !==  loop over all bergs  ==!
         !
         pt => berg%current_point

         ll_bounced = .FALSE.
   
     ! STEP 1 !
         ! ====== !
         zxi1 = pt%xi   ;   zuvel1 = pt%uvel     !**   X1 in (i,j)  ;  V1 in m/s
         zyj1 = pt%yj   ;   zvvel1 = pt%vvel


         !                                         !**   A1 = A(X1,V1)
         CALL icb_accel( kt, berg , zxi1, ze1, zuvel1, zuvel1, zax1,     &
            &                   zyj1, ze2, zvvel1, zvvel1, zay1, zdt_2, 0.5_wp )
         !
         zu1 = zuvel1 / ze1                           !**   V1 in d(i,j)/dt
         zv1 = zvvel1 / ze2

         ! STEP 2 !
         ! ====== !
         !                                         !**   X2 = X1+dt/2*V1   ;   V2 = V1+dt/2*A1
         ! position using di/dt & djdt   !   V2  in m/s
         zxi2 = zxi1 + zdt_2 * zu1          ;   zuvel2 = zuvel1 + zdt_2 * zax1
         zyj2 = zyj1 + zdt_2 * zv1          ;   zvvel2 = zvvel1 + zdt_2 * zay1
         !
         CALL icb_ground( zxi2, zxi1, zu1,   &
            &             zyj2, zyj1, zv1, ll_bounced )

         !                                         !**   A2 = A(X2,V2)
         CALL icb_accel( kt, berg , zxi2, ze1, zuvel2, zuvel1, zax2,    &
            &                   zyj2, ze2, zvvel2, zvvel1, zay2, zdt_2, 0.5_wp )
         !
         zu2 = zuvel2 / ze1                           !**   V2 in d(i,j)/dt
         zv2 = zvvel2 / ze2
         !
         ! STEP 3 !
         ! ====== !
         !                                         !**  X3 = X1+dt/2*V2  ;   V3 = V1+dt/2*A2; A3=A(X3)
         zxi3  = zxi1  + zdt_2 * zu2   ;   zuvel3 = zuvel1 + zdt_2 * zax2
         zyj3  = zyj1  + zdt_2 * zv2   ;   zvvel3 = zvvel1 + zdt_2 * zay2
         !
         CALL icb_ground( zxi3, zxi1, zu2,   &
            &             zyj3, zyj1, zv2, ll_bounced )

Thank you

Hwa-Jin Choi