Coupled OpenIFS+NEMO4.2+XIOS+OASIS freezes

Good evening

I have started to upgrade from NEMO 3.6 to NEMO 4.2 in my coupled model. This also means upgrading XIOS from 2.5 to the latest trunk.

Unfortunately the new model freezes during the “definition” phase of OASIS, i.e. the first time step. The model has 4 executables, “oifs” (OpenIFS), “oceanx” (NEMO), “xios.x” (XIOS) and “rnfma” (runoff scheme). I have added a print statement before “OASIS_ENDDEF” in OpenIFS, NEMO and runoff mapper and know for sure that the models reach that point.
XIOS however, does not seem to reach OASIS_ENDDEF, and so the whole coupled model freezes.

After adding a lot of print statements to client.cpp and server.cpp in XIOS I have figured out that the failing line is when oasis_init_comp is called within the function “CServer::Initialize”, but no error message is raised.

One might assume it’s a bug in XIOS, but surely other groups have used NEMO 4.x with new XIOS for coupled runs?
So I’m thinking it’s either my settings (iodef.xml etc) or compilers/MPI.

Has anyone else managed to run NEMO 4.x in coupled mode with XIOS in server mode?
If so, would it be possible to get some advice on which XIOS revision works and what settings to use in iodef.xml?
Any help here is much appreciated.

I’m using Intel compilers and Intel MPI 2019.5, and I’m using basically the same iodef.xml as I was for NEMO 3.6.

Full version names are: OpenIFS 43r3v2, NEMO 4.2.0, XIOS trunk (a few days old) and OASIS-MCT5.0.

Many thanks
Joakim Kjellsson, GEOMAR

I would first try to run the coupled model without xios (suppress key_xios when compiling NEMO) to really make sure the problem is coming (or not) from xios.
If you have any issue with the output files when you don’t use xios, define nn_write == -1 in namelist_cfg so NEMO will not write any output.

Good evening, Sebastian

Thanks very much for your reply. I followed this suggestion and re-compiled both OpenIFS and NEMO without XIOS, and this version runs. It passes OASIS_ENDDEF, then OASIS generates the remapping files, and I start getting some netcdf output from each NEMO task.

I then took the very same configuration, re-compiled with XIOS, and same freezing again.
So I’m confident that the problem comes from XIOS.

The traceback from XIOS points to OASIS on line 254 in mod_oasis_auxiliary_routines.F90 which says:
CALL mpi_intercomm_create(mpi_comm_local, 0, mpi_comm_global, &
mpi_root_global(il), tag, new_comm, ierr)

I think this is when XIOS sets up communication with NEMO. Could there be some setting for XIOS or NEMO that I’m missing?


Dear Sebastian

I’m happy to report that the problem has been solved after getting some input from Julien Derouillat at IPSL.

Apparently, the order in which “CALL oasis_enddef” and “CALL xios_context_initialize” needs to be done has changed since XIOS trunk 1587.
Both NEMO 4.2 and OpenIFS are written as:

CALL oasis_enddef

CALL xios_context_initialize

but since XIOS trunk version 1587, one needs to do

CALL xios_oasis_enddef
CALL oasis_enddef

CALL xios_context_initialize

and also set “call_oasis_enddef” to “true” in iodef.xml.

I made these changes and that solved the deadlock. Now OpenIFS and NEMO can couple via OASIS and also do I/O via XIOS.
Let me know if I should share these changes with the community.

Best wishes

PS. This is all explained here: