AGRIF child abnormal behaviours at the boundaries

Hello everyone,

I have set up a new AGRIF nest over our NORDIC domain, but I face some very odd behaviour at boundary with the parent that I cannot explain nor I understand how to fix.
Before I detail the problem, there is what I use and want to run:
NEMO 4.2 branch (commit b6ff7755)

  • One way nesting
  • No depth interpolation
  • Parent domain is unmodified for volume conservation (one way nesting)
  • Child is initialized from parent

Child domain made with the DOMAINcfg tool with the configuration:

  • 1 to 5 nest; one nest only
  • Its own high resolution bathymetry from external data
  • Same levels for parent and child
  • npt_connect = 1
  • npt_copy = 4
  • zps with partial steps

With these options, after the first time step in the child appears cells with 0 temperature & salinity inside the sponge layer from layer 2 and below, which crashes NEMO immediately. I made a figure that summarize what we see. This case correspond to the second line in this figure (first one is the surface mask, for reference)

I tried several things to fix the problem:

  • Close the boundary condition in the child domain to see what will happen: I don’t have 0 anymore, but instead I have NaN bleeding from the boundary (see figure, 3rd line).
  • Use the old NESTING tool to make the domain: same problem with the bleeding NaN (+ all a bunch of unrelated problem due to the position of the child that is not where it’s supposed to be).
  • I tried by moving the domain coordinates a bit east and north, same problem.
  • Same problem with an older version of NEMO4.2.
  • Trying to use vertical interpolation. No 0 anymore, but NaN bleeding in.
  • Fill with land a bit of the boundary close to the boundary where the bleeding occurs to prevent it. Doesn’t prevent anything.

The only thing that changes significantly the outcomes is if I use the updated parent domain updated for the volume conservation. It forces me to adjust the eastern boundary by adding land where it’s supposed to be open boundaries (mismatch between parent & child coastline?) but filling these point with land allows the model to run for longer. It finally libc-crashes at the child 7th ts, with NaN in the velocity fields, but that looks like a different problem.

I am really running out of idea here. I think it should be related to the domain but I have no idea how to debug that nor what could be wrong. I did follow the new documentation for nesting.
I guess pushing with the updated parent domain is the way to push forwards, but it creates a bunch of problem that an one-way nesting wouldn’t cause. We have a bunch of very tiny straights in this region and both domain have to be manually adjusted to allow the flow. Two-way nesting makes it very hard to do the same. I would rather have one way nesting work for now…

The weirdest part is that my initial test domain works perfectly fine. This is localized further north between Danemark, Norway and Sweden, and it was made with the old NESTING tool, but it does work. I started this new domain with the NESTING tool as well, and moved away from it mainly because of this problem, so I do not think it’s related to what tool I used for each case.

Anyone has an idea on how to move forwards?
Thanks

Hi,

I think you figured out by yourself that the bathymetry matching done in the DOMAINcfg tool assumes 2 way nesting, and therefore requires using the updated parent topography.
Did you disable the online bathymetry check done at initialization by AGRIF (ln_chk_bathy = .true. in the namelist)?
Since in 2 way nesting, the parent solution well inside the overlapping area does not matter much, why is it a problem for you (narrow straits, etc…)?

Hi,

Yes, I did deactivate that in my one way coupling tests. With it activated, it just didn’t run. Which makes me think, why does AGRIF complain about the volume not matching in one way nesting? It shouldn’t matter, as it’s not feeding back the parent. Or does it needs the bathymetry to be adjusted over the buffer zone? Could that explain the NaN bleeding in?
I did not in the two way nesting, and it didn’t complain about the bathymetry (as expected).

In the two way nesting, the parent domain doesn’t matter, but in our configuration the child domain is still not high enough resolution to resolve some of the straits that we need to have open to get a correct flow. This means that I need to modify the child domain to open these straits after the domain is generated. I feared that doing that will break the child feeding back to parent and I wanted to avoid the problem.

In fact, with 1-way nesting, one still needs to have matching volumes near the boundary. ln_chk_bathy has indeed two parts: 1) check if cells near the child boundary (including in the sponge zone) agree with the parent. 2) check if averaged child volumes match parent everywhere (2-way nesting only). The problem I see is that the DOMAINcfg tools relies on the parent bathymetry update WITHIN the sponge layer. It should not in the sense that this is the parent bathymetry that should be provided in that zone in 1-way nesting, as for the boundary (ghosts) points.

You can adjust your child bathymetry manually, and rerun the DOMAINcfg tool with this child bathymetry as the new input (that’s described in the online documentation). This will ensure that all the matching properties are fullfilled.

In the meantime (e.g. waiting for the DOMAINcfg tool to be specifically modified for 1-way nesting), I’m afraid you have to use the parent ‘updated’ topography.

I checked the code, and, apparently, my statement above is not true. I’m a bit puzzled by the fact that you still need to read the “updated” bathymetry.
Do you have identical minimum levels (or depths), i.e. rn_hmin, in your child/parent grids ?

Do you have identical minimum levels (or depths), i.e. rn_hmin, in your child/parent grids ?

Yes. The namelist are exactly the same in both cases (excepted that the child uses nn_bathy = 2)

You can adjust your child bathymetry manually, and rerun the DOMAINcfg tool with this child bathymetry as the new input (that’s described in the online documentation). This will ensure that all the matching properties are fullfilled.

I totally missed this possibility, and it took me a while to make sure I did it right, which I think I did now. But it’s still not working. The bathymetry is updated, NEMO doesn’t complain anymore when the volume is checked at start, but the way it looks is odd to me.
child parent

As you can see, the child looks perfectly okay, but the parent is contains a lot of land where I do have water in the child. It’s probably not a problem volume wise, but I think it creates some problem during the automatic initialization of the child from the parent. Currently, the child runs 2 or 3 time steps, and then crashes with an unspecific libc error.

When I look at the 1_output.init.nc file, I can see very odd values in temperature, salinity and, more importantly, velocities.

There is the temperature and salinity at the surface. I’ve circled the weird locations that, if we compare the two domain, looks like to correspond to inland in the parent, but not in the child.


It’s even worst when we go at depth, here T at the 7th layer:

And it’s just purely blowing up for velocities (here u, but v is the same), which may explain why a libc error…

This isn’t a problem we have in the output.init in the parent, which looks fine. I do not think it comes from the restart file, either. I tried with an edited restart file with fixed value everywhere, in case it was due to the fact that I didn’t adapt the restart for the new parent, but it didn’t fix the NaN for velocities. It did fix the weird T&S values, but I suspect that it’s just because there very limited interpolation error possible with the same value everywhere.

Any idea how to fix that?

Ok, apparently the problem comes the initialization from the parent (ln_init_chfrpar=.true.).
About masks differences: The parent bathymetry is updated over the overlapping area such that if the averaged child area is 98% of a parent cell, then the parent mask is 1. This could be different, but this explains why you find more unmasked cells over the child grid than over the parent.
From this, it is expected that some extrapolation of parent values are necessary to get child values and this is where the problem seems to be. What is puzzling is that we fixed a similar issue (at c3fc000e) one year ago and your revision is more recent… :thinking:
Double check you have something like CALL Agrif_Set_MaskMaxSearch(10) in your agrif_oce_interp.F90 file.
Meanwhile, you can test your setup by providing a restart file for the child grid (or by starting from a climatology).

Double check you have something like CALL Agrif_Set_MaskMaxSearch(10) in your agrif_oce_interp.F90 file.

Line 90? It’s there.

Meanwhile, you can test your setup by providing a restart file for the child grid (or by starting from a climatology).

I tried with ln_tsd_init = .true., and providing initial data for the parent and the child, and it worked. It worked as well with interpolating from the parent…
I suspect something going on with my restart start from the parent when used on the modified bathymetry. Yet, it’s a bit odd. As far as I can see, the new parent has only less ocean cells than the original, so I don’t see how it could go wrong.

But it could also be related to another problem: I cannot get the volume between the parent and the child to match. Same grid definition in both cases, making both bathymetry from “read bathymetry file” option (my original one for the parent, a modified one for the child), rn_hmin = -3 in both cases, zps in both cases, but activating the volume checks in 1_namelist_cfg leads to the “Averaged Bathymetry does not match parent volume” error.

What else can interfere with the volume conservation?

I thought you sucessfully passed this test in some of your previous experiments, no ?

It did. But I reran the whole domain generation several times and it doesn’t any more. Or does it? I do not understand what is going on.

I had a look at NST/agrif_oce_update.F90 and the condition to run the Agrif_Check_parent_bat() code is:

IF (( .NOT.ln_agrif_2way ).OR.(.NOT.ln_chk_bathy) .OR.(Agrif_Root())) THEN
RETURN
ENDIF

It shouldn’t matter if ln_chk_bathy flag is true of false, if two way ( ln_agrif_2way) is activated, it should execute the code exactly the same way. But I tried several combinaison of ln_chk_bathy & ln_agrif_2way and this is what I got:

  • ln_chk_bathy = .false. & ln_agrif_2way = .false.: works just fine
  • ln_chk_bathy = .true. & ln_agrif_2way = .false.: works just fine
  • ln_chk_bathy = .true. & ln_agrif_2way = .true.: bathymetry not matching error
  • ln_chk_bathy = .false. & ln_agrif_2way = .true.: works until the child feeds back the parent, when I’ll have interpolation error leading to some cells being at 500+ PSU (or 1500°C) and the parent will abort.

One way nest works, so the volume is matching at least at at boundary of the child. If DOMAINcfg produced a proper volume conservation bathymetry for these points, there is no reason to think that it’s not doing it properly somewhere else. But then why a double true in the flag causes an issue?
And at the same time, the outliers values are always at the last ocean cell, which could indicate a problem in volume conservation due to the partial step.

In the attached figures, you can see what is contained in the parent output.abort file. I didn’t select specifically the very high values I mentioned before, but it really looks like interpolation error to me.

interp2

interp1

PS. Yes, the coastline makes no sense in these figures. That’s my test modified child bathymetry to be able to visually see in the parent that the bathy has been updated.