Hi, everyone. I’m a beginner. I want to use NEMO-4.2.2 as a workload to test my machine, which has 80 physical cores and 160 logical cores.
I use OpenMPI and the GYRE_PISCES configuration to test. And I ran the command
mpirun --use-hwthread-cpus -n 160 ./nemo
However, my terminal dispaly an ERROR:
Abort(123) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 123) - process 0
Abort(123) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 123) - process 0
Abort(123) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 123) - process 0prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:Process name: [prterun-black-1592854@1,10] Exit code: 123
I only modified some parameters of namlist_cfg and namlist_ref. The namlist_cfg, namlist_ref and the ocean.output are following:
The modified part of namlist_cfg is: nn_GYRE, jpiglo, jpjglo, rn_Dt
!-----------------------------------------------------------------------
&namusr_def ! GYRE user defined namelist
!-----------------------------------------------------------------------
nn_GYRE = 4 ! GYRE resolution [1/degrees]
ln_bench = .true. ! ! =T benchmark with gyre: the gridsize is kept constant
jpiglo = 1440
jpjglo = 720
jpkglo = 31 ! number of model levels
/
!-----------------------------------------------------------------------
&namdom ! time and space domain
!-----------------------------------------------------------------------
ln_linssh = .true. ! =T linear free surface ==>> model level are fixed in time
!
rn_Dt = 1200. ! time step for the dynamics
/
The modified part of namlist_ref is: jpni, jpnj
!-----------------------------------------------------------------------
&nammpp ! Massively Parallel Processing
!-----------------------------------------------------------------------
ln_listonly = .false. ! do nothing else than listing the best domain decompositions (with land domains suppression)
! ! if T: the largest number of cores tested is defined by max(mppsize, jpni*jpnj)
ln_nnogather = .true. ! activate code to avoid mpi_allgather use at the northfold
jpni = 10 ! number of processors following i (set automatically if < 1), see also ln_listonly = T
jpnj = 16 ! number of processors following j (set automatically if < 1), see also ln_listonly = T
nn_hls = 1 ! halo width (applies to both rows and columns)
nn_comm = 1 ! comm choice
/
And the ocean.output is following:
AAAAAAAA
par_kind : wp = Working precision = dp = double-precision
===>>> : E R R O R =========== misspelled variable in namelist namusr_def in configuration namelist iostat = 5010 usr_def_nam : read the user defined namelist (namusr_def) in namelist_cfg
Namelist namusr_def : GYRE case GYRE used as Benchmark (=T) ln_bench = T inverse resolution & implied domain size nn_GYRE = 4 Ni0glo = 30*nn_GYRE Ni0glo = 122 Nj0glo = 20*nn_GYRE Nj0glo = 82 number of model levels jpkglo = 0 Namelist nammpp processor grid extent in i jpni = 10 processor grid extent in j jpnj = 16 avoid use of mpi_allgather at the north fold ln_nnogather = T halo width (applies to both rows and columns) nn_hls = 1 choice of communication method nn_comm = 1
mpp_init:
The chosen domain decomposition 10 x 16 with 159 land subdomains - uses a total of 1 mpi process - has mpi subdomains with a maximum size of (jpi = 15, jpj = 8, jpi*jpj = 120) The best domain decompostion 1 x 1 with 0 land subdomains - uses a total of 1 mpi process - has mpi subdomains with a maximum size of (jpi = 124, jpj = 84, jpi*jpj = 10416) ===>>> : E R R O R =========== With this specified domain decomposition: jpni = 10 jpnj = 16 we can eliminate only 0 land mpi subdomains therefore the number of ocean mpi subdomains ( 160) exceed the number of MPI processes: 1 ==>>> There is the list of best domain decompositions you should use: For your information: list of the best partitions including land supression ----------------------------------------------------- nb_cores oce: 1, land domains excluded: 0 ( 0.0%), largest oce domain: 10416 ( 124 x 84 )
Can someone tell me what these parameters mean? How should I modify the parameters?
Thank you very much.