Table of Contents
Fetching ...

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu, Yankun Xu, Xiuwen Guo, Yunlong Fei, Zhaoying Wang, Mingkui Li, Yingjing Jiang, Lv Lu, Liang Su, Jiayu Fu, Peinan Yu, Weiguo Liu, Lixin Wu, Lanning Wang, Xin Liu, Dexun Chen, Guangwen Yang

TL;DR

The study tackles kilometer-scale climate modeling on heterogeneous exascale systems by porting CESM 2.2 to a 40-million-core Sunway machine using a non-intrusive workflow that preserves code integrity. It introduces a hierarchical grid framework, an OpenMP offloading toolkit (O2ATH), and an optimized initialization pipeline, enabling broad port coverage (>80%) and substantial performance gains. The results demonstrate world-scale scalability to nearly 40 million cores, with component speeds of 340 SDPD (CAM), 265 SDPD (POP), and 222 SDPD for the coupled model, enabling multi-year simulations at unprecedented resolution. The work provides a practical path to ultra-high-resolution climate experiments and highlights tools and methods that can accelerate adoption on future heterogeneous supercomputers, thereby reducing climate modeling uncertainty at kilometer scales.

Abstract

With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible.

Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

TL;DR

The study tackles kilometer-scale climate modeling on heterogeneous exascale systems by porting CESM 2.2 to a 40-million-core Sunway machine using a non-intrusive workflow that preserves code integrity. It introduces a hierarchical grid framework, an OpenMP offloading toolkit (O2ATH), and an optimized initialization pipeline, enabling broad port coverage (>80%) and substantial performance gains. The results demonstrate world-scale scalability to nearly 40 million cores, with component speeds of 340 SDPD (CAM), 265 SDPD (POP), and 222 SDPD for the coupled model, enabling multi-year simulations at unprecedented resolution. The work provides a practical path to ultra-high-resolution climate experiments and highlights tools and methods that can accelerate adoption on future heterogeneous supercomputers, thereby reducing climate modeling uncertainty at kilometer scales.

Abstract

With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries to minimizes manual code modifications, our project tries to achieve both improvement of performance and consistency of the model code. By using a hierarchical grid system and an OpenMP-based offloading toolkit, our porting and parallelization effort covers over 80% of the code, and achieves a simulation speed of 340 SDPD (simulated days per day) for 5-km atmosphere, 265 SDPD for 3-km ocean, and 222 SDPD for a coupled model, thus making multi-year or even multi-decadal experiments at such high resolution possible.
Paper Structure (21 sections, 13 figures, 4 tables)

This paper contains 21 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: The drought/flood index and reconstructed temperature change since 800 A.D. (results from existing works zhang2008test-monsoonyang2002general-ChinaTempReconstruction), corresponding to some of the major events in history.
  • Figure 2: A detailed example for the compiling and execution workflow of a kernel, when using the O2ATH toolkit.
  • Figure 3: SW26010P processor.
  • Figure 4: Performance improvements on CPEs for major kernels in the atmosphere component CAM.
  • Figure 5: Performance improvements on CPEs for major kernels in the ocean component POP.
  • ...and 8 more figures