Hello,
Due to significant execution time difference between two of our HPCs, I am looking at how much time takes each individual time steps from the timing.output file, in function of the frequency of outputs (hourly, daily, monthly, for station only, which are daily) for each HPC (Bi, Tetralith). Turned out that activating hourly outputs (ssh_inst
, in this case) makes the execution time to blow up (about x5).
NEMO4.2 is running with 96 cpus, XIOS with 16. On Bi, I’m running the experiment with mpirun -n 96 ./nemo.exe : -n 16 ./xios_server
, on Tetralith it’s with srun --mpi pmi2 --multi-prog cpu_mapping
, with the cpu mapping splitting equally the XIOS cores at the end of the available nodes. On Bi it’s compiled with Intel2018, and Intel2023 on Tetralith. The CPUs are different, but a normal time step takes about 0.28s on Bi, and 0.24s on Tetralith.
There is what I have:
y axis: execution time (s)
x axis: experiment progress (in simulated days)
Time step is 180s.
As you can see, in the daily case, I have daily spike of execution time every day, but only on Tetralith. It’s even worst in the hourly case, where I have spikes every hours, plus intermediate smaller spikes every 20mn or so (~1 or 2s/ts). All these spikes combined is what makes the running time very bad when hourly outputs are involved.
Anyone knows what can be the source of these spikes? They don’t really make sense to me. The whole point of using XIOS in detached mode is that NEMO sends the data every time step in the iom_put()
call, and do not have to stop when XIOS is writing something.
Moreover, how can it be HPC dependent? I only showed the results for Tetralith here, as the hardware is similar to Bi, but I get the same problem with our brand new HPC.
I’m in discussion with our HPC support at the same time, but I wondered if someone from the NEMO community has any insight on this behaviour.