MPI error while using the DOMAINcfg tool

I have been trying to create domain files for an eORCA05.L75 configuration for a while. While trying to run the DOMAINcfg tool, I get the following error:

forrtl: severe (41): insufficient virtual memory
Image              PC                Routine            Line        Source
make_domain_cfg.e  00000000006F9C0B  Unknown               Unknown  Unknown
make_domain_cfg.e  00000000006D6823  Unknown               Unknown  Unknown
make_domain_cfg.e  0000000000659222  dombat_mp_dom_bat         349  dombat.f90
make_domain_cfg.e  0000000000483FDE  domzgr_mp_zgr_bat         729  domzgr.f90
make_domain_cfg.e  0000000000447BED  domzgr_mp_dom_zgr         199  domzgr.f90
make_domain_cfg.e  0000000000421BB5  domain_mp_dom_ini          93  domain.f90
make_domain_cfg.e  0000000000410BAF  nemogcm_mp_nemo_i         296  nemogcm.f90
make_domain_cfg.e  000000000040E796  nemogcm_mp_nemo_g         108  nemogcm.f90
make_domain_cfg.e  000000000040E768  MAIN__                     28  make_domain_cfg.f90
make_domain_cfg.e  000000000040E722  Unknown               Unknown  Unknown
libc-2.28.so       0000149444039CA3  __libc_start_main     Unknown  Unknown
make_domain_cfg.e  000000000040E62E  Unknown               Unknown  Unknown

I have tried a lot of different combinations of the nodes, memory per cpu etc and have used srun (eg., srun --mpi=pmi2 -n 40 ./make_domain_cfg.exe) and mpirun (eg., mpirun -np 40 ./make_domain_cfg.exe) for the job. I am also using the option ulimit -s unlimited.

The tail of ocean.output shows:

                     iom_nf90_open ~~~ open existing file: bathy_meter.nc in REA
 D mode
                    ---> bathy_meter.nc OK
           read nav_lon (rec:      1) in bathy_meter.nc ok
           read nav_lat (rec:      1) in bathy_meter.nc ok
           read Bathymetry (rec:      1) in bathy_meter.nc ok
                     iom_close ~~~ close file: bathy_meter.nc ok
 Interpolation of high resolution bathymetry on child grid
 Median average ...

Where could I be going wrong?

Did you try to run on more than 1 node? For example, if you have 64 cores per node, then you can start an interactive job

salloc -n 256 -t 60 ...

and then

srun --mpi=pmi2 -n 256 ...

That would give you 4 full nodes (assuming 64 cores per node) which should definitely be enough for eORCA05.L75.

Did you tried

ulimit -s unlimited

before running ./make_domain_cfg.exe ?

Hi Navajyoth and Sebastian

I’ve ran into the same problem when making eORCA1.L75 from ETOPO2 data (0.03deg).

The issue is related to the interpolation method:

   nn_interp   =    2      ! = 0 - arithmetic mean
                           ! = 1 - median
                           ! = 2 - bilinear interp,  type of interpolation (nn_bathy =2)

nn_interp = 2 works, but I’m getting strange results near the date line. nn_interp = 0 and nn_interp=1 both give me “insufficient virtual memory”.

I’ve set ulimit - s unlimited, split the job over 4 compute nodes (384 Gb each) but still “insufficient memory”. SLURM tells me that the job actually uses around 2 Gb memory, so I’m having a hard time understanding why there is “insufficient memory”.

I could of course use nn_interp = 2 and keep going, but with 0.03deg bathymetry to eORCA1 I would prefer to use either the mean or median instead of bilinear.

This is with NEMO 4.2.2, using Intel 2021.9.0 compilers and MPI.

/J