[v4.2.x][v4.0.x] Mixed layer depth `mldr10_3` not initialized

Hi all,

I am using NEMOv4.0.4 and I came across an issue when trying to output the variable mldr10_3 (mixed layer depth diagnosed from vertical density gradient of 0.03 kg.m-3 with respect to the 10m deep layer). The model outputs something that is polluted by crazy values:

When bounding the value range, the field does not seem that bad though. We see that the unrealistic values are mostly (if not all) located over continent or over shallow ocean regions, see dark red points on this figure:

We found that changing the number of processor to run NEMO is a way to solve the problem. For the figure I posted above I was using 720 processors, but using only 480 processors fixes the issue.

What is puzzling is that 720 processors leads supposedly to the best mpi decomposition. I have this in the ocean.output:

==> you use the best mpi decomposition
Number of mpi processes:   720
Number of ocean subdomains =   720
Number of suppressed land subdomains =   200

Whereas with 480 processors it seems that I am wasting resources (from ocean.output):

   ===>>> : W A R N I N G

              =============

    The number of mpi processes:   480
    exceeds the maximum number of ocean subdomains = 475
    we suppressed  100 land subdomains
    BUT we had to keep     5 land subdomains that are useless...

      --- YOU ARE WASTING CPU... ---

I believe this indicates that 720 proc should be the best option…

However, when looking at the processor decomposition in the case of 720 proc, we can see that for the processor 0 the neighbour processors on the west, north and south are not “allocated”:

resulting internal parameters :
     nproc  =                0
     nowe  =                -1         noea  =                1
     nono  =                -1         noso  =               -1

Which is not the case when using 480 proc:

resulting internal parameters :
     nproc  =                0
     nowe  =               478         noea  =                1
     nono  =                 7         noso  =               -1

We wonder if this could explain the problem we have with the mldr10_3 variable.

Although, if it was the case, we would expect to see this problem in other variables but, so far, mldr10_3 is the only one we found having this problem. I also tried to output the variable mldr10_1, which is computed similarly as mldr10_3 (cf. routine diahth.F90), but for this one I have no problem.

Cheers,
Yohan

Hi Yohan, This will need more investigation to determine if there is really an issue here. Firstly, you don’t mention the configuration you are running. It looks to be an ORCA1 but you need to make that clear.

diahth doesn’t mask its output so it could be that these are all land points? Ferret is masking the unset points. These were points that were never assigned to a processor so XIOS will assign them the missval as set in your context_nemo.xml, probably:

       <variable id="missval"   type="float" > 1.e20 </variable>

But you really want to mask the land points too. You can use a tmask field from a mesh_mask file to do that in Ferret (or change diahth to do it at source).

The fact that processors don’t have to communicate on all sides is not a problem. The “best” decomposition (for any given number of MPI processes) is that which eliminates the most land. Land-only regions are never assigned so any processor adjacent to a land-only region does not need to communicate in the direction of its land-only neighbour. The message with 480 processors is simply saying that the best decomposition it can do with 480 processors has 5 land-only regions remaining. Therefore you are assigning (and being charged for) 5 cores that are not doing anything useful. You will achieve exactly the same setup if you only ask for 475 cores.

With 720 cores you’ve matching the best decomposition close to that number. 720 cores will run quicker
but may cost more than a 475 core run depending on the details of your cluster.

If those rogue values persist after land-masking then it could be that they are well-mixed and shallow enough not to trigger the

IF( zztmp > zrho3 )   zrho0_3(ji,jj) = zzdep

condition (which is only difference with the mld_r01 field). In this case, the initialisation to:

                     zztmp = gdepw_n(ji,jj,mbkt(ji,jj)+1)
                     zrho0_3(ji,jj) = zztmp

must be suspect. But it isn’t clear how that is possible.

Hi Andrew,

Thank you for the explanation about the processor communication.

Indeed, I am using the ORCA1 configuration. Some of the unrealistic values are over land, others are not. See this figure which is the same as the first plot I posted but with land points masked:

So, even after masking land values, the problem is still there.

You are pointing this condition:

IF( zztmp > zrho3 ) zrho0_3(ji,jj) = zzdep

I am thinking that if this was explaining the strange values in mldr0_3, this should be present independently of the number of processors I am using, no?

I agree, you shouldn’t be seeing any imprint of the decomposition since its a 1-D calculation carried out for all i,j points using fields that are set everywhere (if they weren’t, then your mldr0_1 field would have similar structures). How are you collating the global field? Is this one_file output from XIOS? Or are you stitching the files together in a post-processing step?

This is an output from XIOS. Some extra information: in this plot I am showing a monthly mean value but the same happens in the case I output mldr0_3 at the model time step.

Is it mldr0_3 you are plotting or mldr10_3? The longname printed by Ferret suggests its the latter.

Oops, really sorry about that. Since the beginning I am talking and plotting mldr10_3.
I am correcting the title and the issue accordingly too.

In that case there is more to investigate since it isn’t set the same way as mldr0_1. In fact if the condition is triggered it is set to:

zzdep = gdepw_n(ji,jj,jk) * tmask(ji,jj,1)

so any crazy values over land must be from the initialisation of the array. You could make that:

zrho10_3(ji,jj) = zztmp * tmask(ji,jj,1)

in the initialisation section as a test…

Arrr, in fact this may be the issue; that initialisation section is only active if:

IF( iom_use( 'mlddzt' ) .OR. iom_use( 'mldr0_3' ) .OR. iom_use( 'mldr0_1' ) ) THEN

there seems to be an assumption that you aren’t requesting mldr10_3 without one or other of those 3. That seems a dangerous assumption. Is that the case here?

1 Like

I was indeed not requesting any of those 3 variables. I just try adding mldr0_3 (the real one this time) in my file_def.xml and I have nice outputs for the variable mldr10_3:

I realize now that I have been mixing the variables mldr10_3/mldr0_3 and mldr10_1/mldr0_1 in my tests, outputting mldr10_X but looking into the code for mldr0_X. I see now that mldr10_1 is not computed the same way as mldr10_3 (and not even in the same subroutine), as you mentioned.

Thanks a lot for helping me to see through this issue!

Great, glad that is working for you but I will raise a ticket since the code shouldn’t rely on the user selecting the right combinations of outputs.

Ok, thank you.

And do you understand how this is linked to the number of processors I have been using?

Since the arrays weren’t being initialised, their contents would depend on whatever went on previously in that section of memory. That will be different for different processor layouts since the size and shape of the arrays will change. Hence, you got different results for different decompositions in any of the locations that weren’t being actively set by the density difference criterion.

1 Like

fixed in r4.0-HEAD, branch_4.2 and main
https://forge.ipsl.jussieu.fr/nemo/changeset/15768