Nemo hangs when running with updated impi/netcdf/hdf5 versions

Dear All,

My super computer has updated its OS, and impi/netcdf/hdf5 versions, and since then I cannot run Nemo. Compilation of xios2 and nemo works fine, but running nemo produces nothing, the model hangs just after printing the following error message:

dia_mlr_iom_init : IOM context setup for multiple-linear-regression

diamlr: configuration not found or incomplete (field group 'diamlr_fields'
        and/or file group 'diamlr_files' and/or field 'diamlr_time' missing);
        disabling output for multiple-linear-regression analysis.

Any clue ? The very same code worked just fine before the software update.

Thanks in advance for your help,

Robinson

Dear Robinson,

I’m having the same problem with the most recent oneAPI update of the Intel compilers. I’ve managed to find a way to get one of my NEMO configurations to basically initialise diamlr but no fields or regressors. That gets past the issue. If it finds fields and regressors, it crashes with malloc-related errors.

The goal is to get output like this:

dia_detide_init : weight computation for daily detided model diagnostics

                  lk_diadetide =  T
         Tidal component #01                                    = M2  
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!
         Tidal component  n/a is not available!

dia_mlr_init : initialisation of IOM context management for
~~~~~~~~~~~~   multiple-linear-regression analysis

AAAAAAAA


AAAAAAAA


dia_mlr_iom_init : IOM context setup for multiple-linear-regression
      Tidal component #01                                    = Mf  
      Tidal component #02                                    = Mm  
      Tidal component #03                                    = Ssa 
      Tidal component #04                                    = Mtm 
      Tidal component #05                                    = Msf 
      Tidal component #06                                    = Msqm
      Tidal component #07                                    = Sa  
      Tidal component #08                                    = K1  
      Tidal component #09                                    = O1  
      Tidal component #10                                    = P1  
      Tidal component #11                                    = Q1  
      Tidal component #12                                    = J1  
      Tidal component #13                                    = S1  
      Tidal component #14                                    = M2  
      Tidal component #15                                    = S2  
      Tidal component #16                                    = N2  
      Tidal component #17                                    = K2  
      Tidal component #18                                    = nu2 
      Tidal component #19                                    = mu2 
      Tidal component #20                                    = 2N2 
      Tidal component #21                                    = L2  
      Tidal component #22                                    = T2  
      Tidal component #23                                    = eps2
      Tidal component #24                                    = lam2
      Tidal component #25                                    = R2  
      Tidal component #26                                    = M3  
      Tidal component #27                                    = MKS2
      Tidal component #28                                    = MN4 
      Tidal component #29                                    = MS4 
      Tidal component #30                                    = M4  
      Tidal component #31                                    = N4  
      Tidal component #32                                    = S4  
      Tidal component #33                                    = M6  
      Tidal component #34                                    = M8  
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!
      Tidal component n/a  is not available!

diamlr: 0 active regressors found
diamlr: 0 fields selected for analysis

I may need some better solution. I’ll let you know if I find one.

Best regards,

Nick

Hei Nick,

Thanks for your input, it looks like switching to iompi helps, the model manages to make a few time steps, but crashes with NaN in the barotropic mode quickly after. Which is weird because the very same code ran without crashing.
I am sure the issue is linked with the intel fortran compiler now, but i will let you know about my findings.

Cheers,

Robinson

Dear Robinson,

So I found a workaround. It’s a complicated recipe, so I’ll try to explain the basics.

You want to edit your context_nemo.xml file as if you were implementing diamlr as in the AMM12 example with the standard configurations. (Look in /nfs/applications/nemo-4.2.0/cfgs/AMM12/EXPREF). But you must adjust it to your particular circumstances with everything else as normal but the definition of diamlr variables added. The diamlr and diadetide variables are all then turned off.

Here is a worked example for an ORCA025 configuration. I have another working example for AMM15.

<!--
 ============================================================================================== 
    NEMO context
============================================================================================== 
-->
<context id="nemo">
    <!-- $id$ -->
    <variable_definition>
       <!-- Year/Month/Day of time origin for NetCDF files; defaults to 1800-01-01 -->
       <variable id="ref_year"  type="int"> 1900 </variable>
       <variable id="ref_month" type="int"> 01 </variable>
       <variable id="ref_day"   type="int"> 01 </variable>
       <variable id="rho0"      type="float" > 1026.0 </variable>
       <variable id="cpocean"   type="float" > 3991.86795711963 </variable>
       <variable id="convSpsu"  type="float" > 0.99530670233846  </variable>
       <variable id="rhoic"     type="float" > 917.0 </variable>
       <variable id="rhosn"     type="float" > 330.0 </variable>
       <variable id="missval"  type="float" > 1.e20 </variable>        
    </variable_definition>

<!-- Fields definition -->
    <field_definition src="./field_def_nemo-oce.xml"/>   <!--  NEMO ocean dynamics                     -->
    <field_definition src="./field_def_nemo-ice.xml"/>    <!--  NEMO sea-ice model      -->


<!-- Override field definitions for multiple-linear-regression analysis (diamlr) -->
    <field_definition level="1" prec="4" operation="average" enabled=".FALSE." default_value="1.e20" >
      <field_group id="diamlr_fields">
        <!-- Time -->
        <field id="diamlr_time" grid_ref="diamlr_grid_T_2D" prec="8" />
        <!-- Regressors for tidal harmonic analysis -->
        <field id="diamlr_r001" field_ref="diamlr_time" expr="sin( __TDE_M2_omega__ * diamlr_time )" enabled=".FALSE."  comment="harmonic:sin:M2" />
        <field id="diamlr_r002" field_ref="diamlr_time" expr="cos( __TDE_M2_omega__ * diamlr_time )" enabled=".FALSE."  comment="harmonic:cos:M2" />
        <field id="diamlr_r003" field_ref="diamlr_time" expr="sin( __TDE_K1_omega__ * diamlr_time )" enabled=".FALSE."  comment="harmonic:sin:K1" />
        <field id="diamlr_r004" field_ref="diamlr_time" expr="cos( __TDE_K1_omega__ * diamlr_time )" enabled=".FALSE."  comment="harmonic:cos:K1" />
        <field id="diamlr_r005" enabled=".FALSE." />
        <field id="diamlr_r006" enabled=".FALSE." />
        <field id="diamlr_r007" enabled=".FALSE." />
        <field id="diamlr_r008" enabled=".FALSE." />
        <field id="diamlr_r009" enabled=".FALSE." />
        <field id="diamlr_r010" enabled=".FALSE." />
        <field id="diamlr_r011" enabled=".FALSE." />
        <field id="diamlr_r012" enabled=".FALSE." />
        <field id="diamlr_r013" enabled=".FALSE." />
        <field id="diamlr_r014" enabled=".FALSE." />
        <field id="diamlr_r015" enabled=".FALSE." />
        <field id="diamlr_r016" enabled=".FALSE." />
        <field id="diamlr_r017" enabled=".FALSE." />
        <field id="diamlr_r018" enabled=".FALSE." />
        <field id="diamlr_r019" enabled=".FALSE." />
        <field id="diamlr_r020" enabled=".FALSE." />
        <field id="diamlr_r021" enabled=".FALSE." />
        <field id="diamlr_r022" enabled=".FALSE." />
        <field id="diamlr_r023" enabled=".FALSE." />
        <field id="diamlr_r024" enabled=".FALSE." />
        <field id="diamlr_r025" enabled=".FALSE." />
        <field id="diamlr_r026" enabled=".FALSE." />
        <field id="diamlr_r027" enabled=".FALSE." />
        <field id="diamlr_r028" enabled=".FALSE." />
        <field id="diamlr_r029" enabled=".FALSE." />
        <field id="diamlr_r030" enabled=".FALSE." />
        <field id="diamlr_r031" enabled=".FALSE." />
        <field id="diamlr_r032" enabled=".FALSE." />
        <field id="diamlr_r033" enabled=".FALSE." />
        <field id="diamlr_r034" enabled=".FALSE." />
        <field id="diamlr_r035" enabled=".FALSE." />
        <field id="diamlr_r036" enabled=".FALSE." />
        <field id="diamlr_r037" enabled=".FALSE." />
        <field id="diamlr_r038" enabled=".FALSE." />
        <field id="diamlr_r101" field_ref="diamlr_time" expr="diamlr_time^0.0"                       enabled=".FALSE."  comment="mean"            />
        <!-- Fields selected for regression analysis -->
        <field id="diamlr_f001" field_ref="ssh"  enabled=".FALSE." />
        <field id="diamlr_f002" field_ref="uoce" enabled=".FALSE." />
        <field id="diamlr_f003" field_ref="voce" enabled=".FALSE." />
        <field id="diamlr_f004" field_ref="toce" enabled=".FALSE." />
      </field_group>
    </field_definition>

<!-- Files definition -->
    <file_definition src="./file_def_nemo-oce.xml"/>     <!--  NEMO ocean dynamics                     -->

    <file_definition type="multiple_file" name="@expname@_@freq@_@startdate@_@enddate@" sync_freq="10d" min_digits="4">

<!-- Activation of intermediate output for multiple-linear-regression analysis (diamlr) -->
      <file_group id="diamlr_files" output_freq="1d"  output_level="10" enabled=".FALSE."/>

<!-- Activation and selection of daily detided model diagnostics (diadetide) -->
      <file_group id="diadetide_files" output_freq="1d" output_level="10" enabled=".FALSE.">
        <file id="file22" name_suffix="_M2detided_grid_T" description="M2-detided ocean T-grid variables">
          <field id="diadetide_ssh"  field_ref="diadetide_weight_grid_T_2D" operation="accumulate"> this * ssh </field>
        </file>
        <file id="file23" name_suffix="_M2detided_grid_U" description="M2-detided ocean U-grid variables">
          <field id="diadetide_uoce" field_ref="diadetide_weight_grid_U_3D" operation="accumulate"> this * uoce </field>
        </file>
        <file id="file24" name_suffix="_M2detided_grid_V" description="M2-detided ocean V-grid variables">
          <field id="diadetide_voce"  field_ref="diadetide_weight_grid_V_3D" operation="accumulate"> this * voce </field>
        </file>
      </field_group>

    </file_definition>

============================================================================================================
= grid definition = = DO NOT CHANGE =
============================================================================================================
    -->
    
    <axis_definition>
      <axis id="deptht" long_name="Vertical T levels" unit="m" positive="down" />
      <axis id="depthu" long_name="Vertical U levels" unit="m" positive="down" />
      <axis id="depthv" long_name="Vertical V levels" unit="m" positive="down" />
      <axis id="depthw" long_name="Vertical W levels" unit="m" positive="down" />
      <axis id="profsed" long_name="Vertical S levels" unit="cm" positive="down" />
      <axis id="nfloat" long_name="Float number"      unit="-"                 />
      <axis id="icbcla"  long_name="Iceberg class"      unit="1"               />
      <axis id="ncatice" long_name="Ice category"       unit="1"               />
      <axis id="iax_20C" long_name="20 degC isotherm"   unit="degC"            />
      <axis id="iax_28C" long_name="28 degC isotherm"   unit="degC"            />
      <!-- ABL vertical axis definition -->
      <axis id="ght_abl" long_name="ABL Vertical T levels" unit="m" positive="up"   />
      <axis id="ghw_abl" long_name="ABL Vertical W levels" unit="m" positive="up"   />
    </axis_definition>


<!-- Domain definition -->
    <domain_definition src="./domain_def_nemo.xml"/>

<!-- Grids definition -->
    <grid_definition   src="./grid_def_nemo.xml"/>
  

</context>

Nick

Doesn’t quite work with the example above. I get output.abort with infinities, so something has gone wrong.

Nick

Hei Nick,

Thanks for all the tips, but there is something I do not understand: I don’t even know what diamlr, and I don’t need it. The very same code worked & input files & everything, worked like a charm
before I switched to this new Intel mpi fortran version. It really seems to me this where the problem is.
But I cannot pretend switching to ompi is an option, because so far it makes the model crash with NaNs in the baroropic mode.

Hei Nick,

I think i solved the issue, I give you my findings but to be taken with a pinch of salt really, because i have not got the final results yet.
I switched to iompi, I only load this software on my super computer:
module load netCDF-Fortran/4.6.0-iompi-2022a

Nothing else, no HDF5 loading, no path specification, it is all implicit on my computer.

For arch files, if you have a set of arch files for XIOS and Nemo, and designed to use Intel fortran, then you just change the following:

mpiifort → mpif90
mpiicc → mpicc

And you recompile everything, XIOS and Nemo.

And then it works. At least for me, and for now.

Hope this helps,

Robinson

Dear Robinson,

Yes, that makes sense. We don’t use iompi on my system, but it’s nice to know it’s an option.

Nick

Hei,

I think there is a post entitled " [[intel] Model hangs on initialization phase at `xios_close_context_definiti]", that seems to be related. It seems that intel fortran is bugged.

/robinson

For anyone reading this thread, the “output.abort” with infinities mentioned above comes from the atmospheric component of my coupled earth system model. The workaround works. A note to the developers: the diamlr system has been a major bugbear with running with even older versions of Intel MPI. I can’t expect this to be fixed in 4.2, but please be aware of potential issues for 5.

HI Robinson.

Hope you are doing fine. Yes, this problem happens for nemo4.2 and it means that xios hangs in the initialisation procedure. You may need to look at MPI fabrik available on your super computer. For example, the following the environment variable can maybe solve the problem.

export I_MPI_FABRICS=shm:ofi

Best

Saeed

Hej Saeed,

Thanks for the tip. I will try it, but for now I must say that I solved the problem by switching to iompi. And not only it solved the issue, but it seems Nemo runs a lot faster.

Cheers,

Robinson