Dataflow Optimized Reconfigurable Acceleration for FEM-based CFD Simulations
Anastassis Kapetanakis, Aggelos Ferikoglou, George Anagnostopoulos, Sotirios Xydis
TL;DR
This paper tackles the heavy computational burden of solving the $3D$ compressible Navier-Stokes equations for CFD by proposing a dataflow-optimized, FEM-based accelerator implemented with High-Level Synthesis on an AMD Alveo U200 FPGA. The architecture uses two FPGA kernels arranged across SLRs to perform RK4 time stepping and state updates, augmented by memory-aware off-chip transfer and task-level pipelining to maximize throughput. Key results show about a 7.9× speedup over optimized Vitis-HLS designs and a 45% latency reduction with 3.64× lower power compared to a high-end server CPU, underscoring substantial gains in both performance and energy efficiency. By solving the NS equations with FEM and RK4 in a reconfigurable FPGA platform, the approach enables more complex CFD scenarios with flexible geometries and boundary conditions, paving the way for scalable, efficient CFD acceleration.
Abstract
Computational Fluid Dynamics (CFD) simulations are essential for analyzing and optimizing fluid flows in a wide range of real-world applications. These simulations involve approximating the solutions of the Navier-Stokes differential equations using numerical methods, which are highly compute- and memory-intensive due to their need for high-precision iterations. In this work, we introduce a high-performance FPGA accelerator specifically designed for numerically solving the Navier-Stokes equations. We focus on the Finite Element Method (FEM) due to its ability to accurately model complex geometries and intricate setups typical of real-world applications. Our accelerator is implemented using High-Level Synthesis (HLS) on an AMD Alveo U200 FPGA, leveraging the reconfigurability of FPGAs to offer a flexible and adaptable solution. The proposed solution achieves 7.9x higher performance than optimized Vitis-HLS implementations and 45% lower latency with 3.64x less power compared to a software implementation on a high-end server CPU. This highlights the potential of our approach to solve Navier-Stokes equations more effectively, paving the way for tackling even more challenging CFD simulations in the future.
