I run PISCES using v4.0.6 with XIOS 2.5. I have already prepared the dynamical fields using the previous version NEMO 3.6. The configuration has 797 procs (
jpnj = 32 x 52) and I kept this domain decomposition in PISCES run as well. During several submission jobs the model “freezes” before the first time step (it isn’t killed, but it stops to write in ocean.output before read the dynamical fields), without killed the job.
However, when I use the optimal decomposition that suggested in ocean.output (55 x 31 = 793) this problem doesn’t seem to exist.
Is there any way to fix this problem or will I have to follow the new domain decomposition?
Note that this problem sometimes doesn’t exist when I change the
nn_itend, but it seems as something random.
jpnj should not be a problem as long as you run the model on the appropriate number of cores (i.e.
The strange behaviour of the model in your tests suggests that you may have a problem with the amount of memory your are trying to allocate.
- Can you test to run the model on 797 procs (
jpnj= 32 x 52) without xios (i.e. without
- How do you distribute the XIOS processes among the NEMO processes? Did you try to spread the XIOS processes among the NEMO processes? For example put 1 or 2 XIOS processes on each node.
I put 19 nemo processes and 1 XIOS on each node (each node has 20 processes) and 2 XIOS on the last node. The administrator of the HPC system told me that he didn’t find any memory issue during the run, but I don’t know if it is any other way to check the problem.
So does that mean you are using 42 XIOS servers? You don’t mention which configuration you are running but if it is one of the eORCA grids then you may have latitude bands in Antarctica for which some of your XIOS servers have no sea points. This is another possible cause.
did you solve this issue?
I am encountering similar troubles: the model freezes for no obvious reason at the end of the initialization phase.
I have not completely solved the problem, but I managed to overcome this issue in a way. I chose an appropriate domain decomposition according to the note:
Due to the different domain decompositions between XIOS and NEMO, if the total number of cores is larger than the number of grid points in the
jdirection then the model run will fail
And spread the XIOS processes among the NEMO processes. When the freeze happened again, I changed the number of
nn_itend and the model was running without knowing why it happens!
Do you use xios in attached or detached (server) mode? Thanks for your reply, Anne