Segmentation fault on *nemogcm.f90* and *nemo.f90*

Dear all,

Hello, I am a beginner and I met some problems when running test case.
I’m trying to set up a configuration with GYRE_PISCES (TOP wasn’t compiled for speed).
I encounter this problem. I think it is related with the initialization of the NEMO.
Does anyone know how to fix it?

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f7efb9e3d21 in ???
#1  0x7f7efb9e2ef5 in ???
#2  0x7f7efb68820f in ???
#3  0x7f7efbcf732b in ???
#4  0x55c4fcacb1ff in ???
#5  0x55c4fc6c0736 in ???
#6  0x55c4fc88830e in ???
#7  0x55c4fc48a7ac in ???
#8  0x55c4fbe2fa0c in __nemogcm_MOD_nemo_init
	at /media/cmlws/Data1/[user]/NEMO/r4.0.6/cfgs/GYRE_testing/BLD/ppsrc/nemo/nemogcm.f90:269
#9  0x55c4fbe329fd in __nemogcm_MOD_nemo_gcm
	at /media/cmlws/Data1/[user]/NEMO/r4.0.6/cfgs/GYRE_testing/BLD/ppsrc/nemo/nemogcm.f90:165
#10  0x55c4fbe2df61 in nemo
	at /media/cmlws/Data1/[user]/NEMO/r4.0.6/cfgs/GYRE_testing/WORK/nemo.f90:18
#11  0x55c4fbe2df9a in main
	at /media/cmlws/Data1/[user]/NEMO/r4.0.6/cfgs/GYRE_testing/WORK/nemo.f90:11

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 1671418 RUNNING AT cmlws
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

For information my arch file:

%NCDF_HOME           /home/[user]/anaconda3
%HDF5_HOME           /home/[user]/anaconda3
%XIOS_HOME           /media/cmlws/Data1/[user]/xios-2.5
%OASIS_HOME          /not/defined

%NCDF_INC            -I%NCDF_HOME/include -I%HDF5_HOME/include
%NCDF_LIB            -L%NCDF_HOME/lib -lnetcdff -lnetcdf -lstdc++
%XIOS_INC            -I%XIOS_HOME/inc
%XIOS_LIB            -L%XIOS_HOME/lib -lxios -L/usr/lib/gcc/x86_64-linux-gnu/9 -lstdc++

%OASIS_INC           -I%OASIS_HOME/build/lib/mct -I%OASIS_HOME/build/lib/psmile.MPI1
%OASIS_LIB           -L%OASIS_HOME/lib -lpsmile.MPI1 -lmct -lmpeu -lscrip

%CPP                 /usr/bin/cpp-9
%CPPFLAGS            -P -traditional

%FC                  /usr/bin/mpif90
%FCFLAGS             -fdefault-real-8 -funroll-all-loops -cpp -fcray-pointer -ffree-line-length-none -g -O0 -fbacktrace
%FFLAGS              %FCFLAGS
#%LD                  /usr/bin/mpif90 -Wl,-rpath=$HOME/INSTALL/lib:/usr/lib
%LD                  /usr/bin/mpif90
%LDFLAGS             -L/usr/lib/x86_64-linux-gnu
%FPPFLAGS            -P -C -traditional
%AR                  ar
%ARFLAGS             -rs
%MK                  make
%USER_INC            %XIOS_INC %OASIS_INC %NCDF_INC
%USER_LIB            %XIOS_LIB %OASIS_LIB %NCDF_LIB

%CC                  cc
%CFLAGS              -O0 -fbacktrace

Thank you in advance.

Best regards,

Hwa-Jin Choi

Hei Jin,

What do you mean by “TOP was not compiled for speed” ? Can you run GYRE without PISCES ?

Cheers,

Robinson

Did you have a look at ocean.output? Do you see the E R R O R flag?
What is the contents of the line 269 of …/r4.0.6/cfgs/GYRE_testing/BLD/ppsrc/nemo/nemogcm.f90 ?

Sébastien

Robinson,

I’m sorry. I stand corrected.
In /cfgs/GYRE_PISCES/cpp_GYRE_PISCES.fcm, I replaced key_top with key_nosignedzero. I referred to this site.
https://nemo-related.readthedocs.io/en/latest/compilation_notes/nemo40.html
Thank you for your interest.

Best regards,

Hwa-Jin

@smasson

Thank you for your advices.
I can’t find the file… maybe I guess it wasn’t created.
Can you tell me where the ocean.output is located?

The content of the line 269 of nemogcm.f90 is about the IF( lk_oasis ) THEN

SUBROUTINE nemo_init
      !!----------------------------------------------------------------------
      !!                     ***  ROUTINE nemo_init  ***
      !!
      !! ** Purpose :   initialization of the NEMO GCM
      !!----------------------------------------------------------------------
      INTEGER ::   ios, ilocal_comm   ! local integers
      !!
      NAMELIST/namctl/ ln_ctl   , sn_cfctl, nn_print, nn_ictls, nn_ictle,   &
         &             nn_isplt , nn_jsplt, nn_jctls, nn_jctle,             &
         &             ln_timing, ln_diacfl
      NAMELIST/namcfg/ ln_read_cfg, cn_domcfg, ln_closea, ln_write_cfg, cn_domcfg_out, ln_use_jattr
      !!----------------------------------------------------------------------
      !
      cxios_context = 'nemo'
      !
      !                             !-------------------------------------------------!
      !                             !     set communicator & select the local rank    !
      !                             !  must be done as soon as possible to get narea  !
      !                             !-------------------------------------------------!
      !
      IF( Agrif_Root() ) THEN
         IF( lk_oasis ) THEN
            CALL cpl_init( "oceanx", ilocal_comm )                               ! nemo local communicator given by oasis
            CALL xios_initialize( "not used"       , local_comm =ilocal_comm )   ! send nemo communicator to xios
         ELSE
            CALL xios_initialize( "for_xios_mpi_id", return_comm=ilocal_comm )   ! nemo local communicator given by xios
         ENDIF
      ENDIF

Is there any way to fix this problem?
Thank you again for your time.

Best regards,

Hwa-Jin

Very strange…
The error appears at the early beginning at the simulation. almost nothing has been done. It looks like the code was not properly compiled.
lk_oasis should be defined and initialized to .false. in …/r4.0.6/cfgs/GYRE_testing/BLD/ppsrc/nemo/sbc_oce.f90. Do you see this line:

LOGICAL , PUBLIC ::   lk_oasis = .FALSE.

Smasson

Thanks to your comment, I rechecked the compiled code.
I was suspicious of installation of xios, so I reinstalled xios, the segment fault error was passed.
I think the xios and NEMO weren’t configured to be consistent with the compilers.
Thanks for your help :grinning:

Hwa-Jin

Greetings, Jin
I faced the same Segmentation Fault due ORCA2_ICE_PISCES configration run, the problem described here
https://nemo-ocean.discourse.group/t/segmentation-fault-due-to-new-configuration-run-nemo/342/8
Could you suggest any steps, based on your experience?

Hello Phil,

I think your code was not properly compiled.
As I mentioned above, I reinstalled XIOS, the error was passed.
If the code stops before running, I check that XIOS is installed properly, the library paths of NEMO is set correctly. It is also necessary to compile the netcdf and hdf libraries with the same version of the MPI implementation that both NEMO and XIOS have been compiled and linked with.

Hope this helps,

Hwa Jin Choi

Thank you for your answer. All parts of puzzle with nemo segmentation fault should be accomplished soon)
Btw, the reinstallation of XIOS required full nemo reinstallation (recompilation) or it’s a separate process that does not affect the other structure?

Hello Phil

It is recommended to recompile XIOS and then recompile nemo.

Hwa Jin Choi

Greetings all.
I ran successfully the ORCA2_SAS_ICE configuration (followed by ARC36 config manual).
However, I have a segmentation fault every time when I run ./nemo from a new configuration directory.
What I’ve done:

  1. New configuration created by following
$ ./makenemo -n 'MY_NEW_TEST' -r 'ORCA2_ICE_PISCES' -m 'gfortran_test'

I’ve chosen the ORCA2_ICE_PISCES, because i need ocean and ice dynamics together

  1. Tried to compile with different CPP KEYS: key_oce, key_mpi2 etc

  2. Copied all necessary files in EXP00 folder (domain_cfg.nc, forcing, initial etc.)

  3. Ran ./nemo with command

$ mpirun -n '26' ./nemo
  1. Segmentation fault

The screenshots are attached here for better understanding. Hoping for your help to newbie in NEMO modelling.
Please suggest the ways to find an error, is it in CPP KEYS issue? Or it’s a gfortran compiler issue, maybe there are some special ways - how to compile ICE_PISCES to run nemo successfully?
Kind regards!

Hei,

I usually get this kind of error when I lack memory.

Hope this helps,

Robinson

1 Like

Thank you for reply. Unfortunately, I used 26 processors in previous experiment (ORCA_SAS_ICE) - everything worked smooth and I had some output at least. However, when i decide to build ocean+ice config (ORCA_ICE_PISCES) - i got this issue.
I started to think that it might be a structure problem, for example in namelists. What about namelist_cfg and namelist_ref, should they be similar (i mean all inputs)?

namelist_cfg modifies namelist_ref basically, so the purpose is that they are not the same: one uses ref for 95% of inputs let’s say, and then 5% is modified by cfg.

It might be that SAS_ICE uses a lot less memory than ICE_PISCES. Can you try to run ORCA_ICE without PISCES to check ?

1 Like

May I include my namelist_cfg file to make my question clear?

Sure, or we can chat on zoom if you like and you’ll explain me what you want to do, could be easier…

Sure, it will be more productive. Appreciate your help. Here is my email:
civilseagull@gmail.com, type me smth there and ill share a link for zoom if you have time

There is a problem still exists. Even when I recompiled ORCA2_ICE_PISCES. However, test case of this ref. config is also crushing with segmentation fault.
Possibly, it’s a problem related to MPI settings, will be grateful for any instruments for MPI diagnostics or some steps to examine the ./nemo run.

did you look at the lines 158 and 252 in the …/BLD/ppsrc/nemo/nemogcm.f90 as suggested by your error message? Which lines is it?