Hybrid parallel discrete adjoints in SU2
Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger
TL;DR
The paper addresses enabling hybrid MPI+OpenMP parallelism for discrete adjoints in SU2 by integrating OpDiLib, with a thorough examination of architectural changes, performance tradeoffs, and validation on large-scale test cases. It presents a detailed implementation across the AD workflow, including identifier management, LSE differentiation, and preaccumulation strategies, and demonstrates measurable memory reductions alongside runtime overheads. Key contributions include coupling SU2 with OpDiLib, proposing macro-driven OpenMP differentiation, and providing automated testing and thread-sanitizer analyses to ensure correctness and robustness. The findings show that hybrid parallel discrete adjoints are feasible and beneficial for memory-constrained large-scale simulations, while also outlining practical limitations and directions for future improvements in both SU2 and OpDiLib contexts.
Abstract
The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP parallelism, marks a key step in this endeavour. We identify the affected parts of SU2's advanced AD workflow and discuss the required changes and their tradeoffs. Detailed performance studies compare MPI parallel and hybrid parallel discrete adjoints in terms of memory and runtime and unveil key performance characteristics. We showcase the effectiveness of performance optimizations and highlight perspectives for future improvements. At the same time, this study demonstrates the applicability of OpDiLib in a large code base and its scalability on large test cases, providing valuable insights for future applications both within and beyond SU2.
