Dear all,
I am experiencing issues running a nested NEMO3.6 simulation coupled with ECHAM-6.3.05p2 via OASIS3-MCT 4.0. For output, we are using XIOS2.5 versions r1910 and r2497. We have upgraded from OASIS3-MCT 3.6 to a new OASIS and XIOS version, because of compiler problems with this configuration.
The model crashes at the beginning of the simulation at ts=1. ECHAM is waiting for input files from OASIS, so everything seems fine on that side. However, NEMO throws an error message indicating memory corruption during memory allocation. The backtrace shows it crashes in iom_p3d
after the malloc(): memory corruption (fast)
error appears.
Additionally, XIOS generates various error messages, but they are not consistent. The Awaiting data of size
statement appears to me to be incorrect.
It seems to me that the problem lies with XIOS, but I welcome any suggestions. Thank you a lot in advance!
All the best, Tronje
XIOS error messages:
Error [void CGrid::inputField(const CArray<double,n>& field, CArray<double,1>& stored) const] : In file '/home/shktkeme/esm/models/foci-agrif_mops_oasismct4/xios/inc/grid.hpp', line 381 -> [ Awaiting data of size = 1, Received data size = 48438 ] The data array does not have the right size! Grid = grid_T_3D
> Error [const CCalendar& CDate::getRelCalendar(void) const] : In file '/home/shktkeme/esm/models/foci-agrif_mops_oasismct4/xios/src/date.cpp', line 149 -> Invalid state: The date is not associated with any calendar.
Error [void CGrid::inputField(const CArray<double,n>& field, CArray<double,1>& stored) const] : In file '/home/shktkeme/esm/models/foci-agrif_mops_oasismct4/xios/inc/grid.hpp', line 381 -> [ Awaiting data of size = 1, Received data size = 1053 ] The data array does not have the right size! Grid = grid_T_2D
> Error [void CGrid::inputField(const CArray<double,n>& field, CArray<double,1>& stored) const] : In file '/home/shktkeme/esm/models/foci-agrif_mops_oasismct4/xios/inc/grid.hpp', line 381 -> [ Awaiting data of size = 0, Received data size = 48438 ] The data array does not have the right size! Grid = grid_T_3D
log error messages:
line:2004089
642: *** Error in `/scratch/usr/shktkeme/esm-experiments/agrifmopstest/run_19000101-19000101/work/./oceanx': malloc(): memory corruption (fast): 0x0000000003bf0670 ***
642: ======= Backtrace: =========
642: /lib64/libc.so.6(+0x7f474)[0x2aaaae690474]
642: /lib64/libc.so.6(+0x82bb0)[0x2aaaae693bb0]
642: /lib64/libc.so.6(__libc_malloc+0x4c)[0x2aaaae69678c]
642: /sw/compiler/gcc/9.3.0/skl/lib64/libstdc++.so.6(_Znwm+0x15)[0x2aaaaad743f5]
642: /scratch/usr/shktkeme/esm-experiments/agrifmopstest/run_19000101-19000101/work/./oceanx[0x12467fc]
...
642: ======= Memory map: ========
642: 00400000-01cef000 r-xp 00000000 3d8:ad14e 1116896249929355080 /scratch/usr/shktkeme/esm-experiments/agrifmopstest/run_19000101-19000101/work/oceanx
642: 01eef000-01ef1000 r--p 018ef000 3d8:ad14e 1116896249929355080 /scratch/usr/shktkeme/esm-experiments/agrifmopstest/run_19000101-19000101/work/oceanx
642: 01ef1000-027bb000 rw-p 018f1000 3d8:ad14e 1116896249929355080 /scratch/usr/shktkeme/esm-experiments/agrifmopstest/run_19000101-19000101/work/oceanx
642: 027bb000-1359f000 rw-p 00000000 00:00 0 [heap]
642: 2aaaaaaab000-2aaaaaacd000 r-xp 00000000 00:1d 205111 /usr/lib64/ld-2.17.so
642: 2aaaaaacd000-2aaaaaacf000 r-xp 00000000 00:00 0 [vdso]
642: 2aaaaaacf000-2aaaaaae3000 rw-p 00000000 00:00 0
642: 2aaaaaae3000-2aaaaaae4000 r--s dabbad0003420000 00:05 18058 /dev/hfi1_0
...
2005706 642: forrtl: error (76): Abort trap signal
2005707 642: Image PC Routine Line Source
2005708 642: oceanx 0000000001630DF4 Unknown Unknown Unknown
2005709 642: libpthread-2.17.s 00002AAAADEEA630 Unknown Unknown Unknown
2005710 642: libc-2.17.so 00002AAAAE647387 gsignal Unknown Unknown
...
2005716 642: libstdc++.so.6.0. 00002AAAAAD743F5 _Znwm Unknown Unknown
2005717 642: oceanx 00000000012467FC Unknown Unknown Unknown
...
2005725 642: oceanx 00000000009F5285 Unknown Unknown Unknown
2005726 642: oceanx 00000000006C58ED iom_mp_iom_p3d_ 1524 iom.f90
2005727 642: oceanx 00000000004FC068 trcwri_my_trc_mp_ 157 trcwri_my_trc.f90
2005728 642: oceanx 00000000004FBD1F trcwri_mp_sub_loo 184 trcwri.f90
2005729 642: oceanx 00000000004FBD05 trcwri_mp_trc_wri 137 trcwri.f90
2005730 642: oceanx 00000000004DFDA2 trcstp_mp_sub_loo 200 trcstp.f90
2005731 642: oceanx 00000000004DFAA1 trcstp_mp_trc_stp 99 trcstp.f90
2005732 642: oceanx 000000000044A92E step_mp_sub_loop_ 425 step.f90
2005733 642: oceanx 0000000000449F93 step_mp_stp_ 106 step.f90
2005734 642: oceanx 000000000057A51F agrif_util_mp_agr 581 modutil.f90
2005735 642: oceanx 000000000044B71C step_mp_sub_loop_ 533 step.f90
2005736 642: oceanx 0000000000449F93 step_mp_stp_ 106 step.f90
...
2030996 srun: error: bcn1291: task 642: Aborted
iom.f90:
SUBROUTINE iom_p3d( cdname, pfield3d )
use Agrif_Types, only : Agrif_tabvars
character(*), intent(in) :: cdname
real(wp), intent(in), dimension(:,:,:) :: pfield3d
CALL xios_send_field(cdname, pfield3d)
END SUBROUTINE iom_p3d