Table of Contents
Fetching ...

Hybrid parallel discrete adjoints in SU2

Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger

TL;DR

The paper addresses enabling hybrid MPI+OpenMP parallelism for discrete adjoints in SU2 by integrating OpDiLib, with a thorough examination of architectural changes, performance tradeoffs, and validation on large-scale test cases. It presents a detailed implementation across the AD workflow, including identifier management, LSE differentiation, and preaccumulation strategies, and demonstrates measurable memory reductions alongside runtime overheads. Key contributions include coupling SU2 with OpDiLib, proposing macro-driven OpenMP differentiation, and providing automated testing and thread-sanitizer analyses to ensure correctness and robustness. The findings show that hybrid parallel discrete adjoints are feasible and beneficial for memory-constrained large-scale simulations, while also outlining practical limitations and directions for future improvements in both SU2 and OpDiLib contexts.

Abstract

The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP parallelism, marks a key step in this endeavour. We identify the affected parts of SU2's advanced AD workflow and discuss the required changes and their tradeoffs. Detailed performance studies compare MPI parallel and hybrid parallel discrete adjoints in terms of memory and runtime and unveil key performance characteristics. We showcase the effectiveness of performance optimizations and highlight perspectives for future improvements. At the same time, this study demonstrates the applicability of OpDiLib in a large code base and its scalability on large test cases, providing valuable insights for future applications both within and beyond SU2.

Hybrid parallel discrete adjoints in SU2

TL;DR

The paper addresses enabling hybrid MPI+OpenMP parallelism for discrete adjoints in SU2 by integrating OpDiLib, with a thorough examination of architectural changes, performance tradeoffs, and validation on large-scale test cases. It presents a detailed implementation across the AD workflow, including identifier management, LSE differentiation, and preaccumulation strategies, and demonstrates measurable memory reductions alongside runtime overheads. Key contributions include coupling SU2 with OpDiLib, proposing macro-driven OpenMP differentiation, and providing automated testing and thread-sanitizer analyses to ensure correctness and robustness. The findings show that hybrid parallel discrete adjoints are feasible and beneficial for memory-constrained large-scale simulations, while also outlining practical limitations and directions for future improvements in both SU2 and OpDiLib contexts.

Abstract

The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP parallelism, marks a key step in this endeavour. We identify the affected parts of SU2's advanced AD workflow and discuss the required changes and their tradeoffs. Detailed performance studies compare MPI parallel and hybrid parallel discrete adjoints in terms of memory and runtime and unveil key performance characteristics. We showcase the effectiveness of performance optimizations and highlight perspectives for future improvements. At the same time, this study demonstrates the applicability of OpDiLib in a large code base and its scalability on large test cases, providing valuable insights for future applications both within and beyond SU2.
Paper Structure (21 sections, 12 equations, 9 figures)

This paper contains 21 sections, 12 equations, 9 figures.

Figures (9)

  • Figure 1: OpenMP-MPI hybrid parallel execution of SU2. The hybrid parallelism is reflected in the AD tools that provide the derivatives. CoDiPack and MeDiPack are already applied to SU2. In the course of this work, we apply OpDiLib for the differentiation of OpenMP.
  • Figure 2: Decomposition of the discrete adjoint wall clock time.
  • Figure 3: NACA 0012 mesh (left) and Onera M6 mesh (right) for single-socket performance studies.
  • Figure 4: AD-specific performance of the NACA 0012 test case (top) and the Onera M6 test case (bottom). Recording performance (left), management performance (middle), and evaluation performance (right). Serial and parallel timings for various build configurations. Error bars indicate the variation across runs. Speedup factors are relative to the serial run of the respective build.
  • Figure 5: NACA 0012 performance (top) and Onera M6 performance (bottom). Memory high-water marks depending on the type and degree of parallelism for the five SU2 build configurations (left), and joint memory consumption and evaluation time for selected configurations with varying degrees of parallelism (right).
  • ...and 4 more figures