Dear NEMO community,
NEMO v5.0-beta is now available!
To checkout the 5.0-beta release:
git clone --branch 5.0-beta NEMO Workspace / Nemo · GitLab nemo_5.0-beta
See the highlights summarized below and please visit the Gitlab tag for more detailed information on the improvements made in terms of physics, sea ice, biogeochemistry, performance and more! Please pay special attention to the notes on changes made to the cpp keys and namelist parameters. We welcome users to test this beta release and return feedback to us via the NEMO user chat, Discourse.
Highlights
The 5.0-beta release marks a significant stride forward for NEMO in terms of optimization and efficiency. The implementation of a new Runge-Kutta 3rd order temporal scheme (via cpp key: key_RK3) along with extensive optimizations, has substantially enhanced the code’s performance. We observe more than a doubling in speed compared to NEMO 4.x versions. Within this acceleration, one-third is attributed to optimization efforts, while two-thirds are credited to the implementation of RK3 in place of the original Modified Leap-Frog (MLF) time-stepping scheme. These speed improvements have been observed in ocean-ice global simulations conducted at two distinct resolutions: 1/4° eORCA025 and 1° eORCA1. RK3 enables a doubling of the time step, and eORCA1 can now permit a time step of 2 hours. Further improvements to performance are available via the new delayed MPI communications (ln_mppdelay), loop tiling functionalities (ln_tile), and parallel “sister” grids when using AGRIF grid refinement. The code has also been entirely reshuffled and can now only run with 2 halos (nn_hls=2).
NEMO 5.0-beta also brings with it the first steps towards compatibility with hybrid CPU-GPU computing by integrating PSyclone source-code processing into the build system. With 5.0-beta, passthrough testing (where code is processed by PSyclone but not transformed) of all SETTE configurations with the latest release version of PSyclone is successful. Ultimately, this source-code transformation facility can be used to identify computational kernels and insert compiler directives in order to exploit parallelism. Achieving optimal performance by this method is not yet fully automatic but the transformations will be capable of generating code which will compile and run using GPU resources. At this stage, support for Nvidia compilers and hardware is more mature but progress is underway to generalize the approach towards a wider range of platforms. Any beta testing in support of this goal is strongly encouraged.
NEMO 5.0 is the last version supporting both temporal schemes, MLF and RK3. Subsequent versions will no longer include MLF.