Table of Contents
Fetching ...

Matrix-Free Methods for Finite-Strain Elasticity: Automatic Code Generation with No Performance Overhead

Michał Wichrowski, Mohsen Rezaee-Hajidehi, Jože Korelc, Martin Kronbichler, Stanisław Stupkiewicz

TL;DR

This work addresses the computational bottleneck of matrix-free tangent evaluations in finite-strain elasticity by generating quadrature-point kernels with AceGen through automatic differentiation. It systematically compares on-the-fly and cached (partial-assembly) strategies and demonstrates that AceGen-generated code can outperform traditional hand-written implementations while incurring no overhead, for both compressible neo-Hookean and isochoric–volumetric split variants. A seed-based AD approach eliminates the need to form the full tangent operator, and caching of the fourth-order tensor or related quantities further boosts performance, especially in 2D, with matrix-free methods surpassing sparse-matrix solvers by large factors at higher polynomial degrees. The findings underscore the practical impact of automatic code generation for matrix-free solvers in nonlinear elasticity on modern HPC hardware and suggest avenues for refining smoothers and geometry-handling (e.g., cutFEM, multigrid variants) to tackle more complex models and near-incompressibility.

Abstract

This study explores matrix-free tangent evaluations in finite-strain elasticity with the use of automatically-generated code for the quadrature-point level calculations. The code generation is done via automatic differentiation (AD) with AceGen. We compare hand-written and AD-generated codes under two computing strategies: on-the-fly evaluation and caching intermediate results. The comparison reveals that the AD-generated code achieves superior performance in matrix-free computations.

Matrix-Free Methods for Finite-Strain Elasticity: Automatic Code Generation with No Performance Overhead

TL;DR

This work addresses the computational bottleneck of matrix-free tangent evaluations in finite-strain elasticity by generating quadrature-point kernels with AceGen through automatic differentiation. It systematically compares on-the-fly and cached (partial-assembly) strategies and demonstrates that AceGen-generated code can outperform traditional hand-written implementations while incurring no overhead, for both compressible neo-Hookean and isochoric–volumetric split variants. A seed-based AD approach eliminates the need to form the full tangent operator, and caching of the fourth-order tensor or related quantities further boosts performance, especially in 2D, with matrix-free methods surpassing sparse-matrix solvers by large factors at higher polynomial degrees. The findings underscore the practical impact of automatic code generation for matrix-free solvers in nonlinear elasticity on modern HPC hardware and suggest avenues for refining smoothers and geometry-handling (e.g., cutFEM, multigrid variants) to tackle more complex models and near-incompressibility.

Abstract

This study explores matrix-free tangent evaluations in finite-strain elasticity with the use of automatically-generated code for the quadrature-point level calculations. The code generation is done via automatic differentiation (AD) with AceGen. We compare hand-written and AD-generated codes under two computing strategies: on-the-fly evaluation and caching intermediate results. The comparison reveals that the AD-generated code achieves superior performance in matrix-free computations.

Paper Structure

This paper contains 17 sections, 20 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Discretization of the heterogeneous structure at the coarsest mesh level and the prescribed boundary conditions. Both the figure and mesh are taken from the paper by Davydov et al. davydov2020matrix.
  • Figure 2: Measured throughput of matrix--vector operator evaluation for the compressible neo-Hookean model. The processing rate is expressed in DoFs/second. The data is shown for 2D (left) and 3D (right). The results obtained with automatically generated code are depicted with solid lines, while the ones obtained with hand-written code davydov2020matrix are shown with dotted lines. The sparse-matrix vmult is shown by red dashed line.
  • Figure 3: Memory requirements per degree of freedom for matrix--vector operator application for the compressible neo-Hookean model. The storage size is expressed in the number of floating point numbers per DoF. The data is shown for 2D (left) and 3D (right). The results obtained with automatically generated code are depicted with solid lines, while the ones obtained with hand-written code davydov2020matrix are shown with dotted lines.
  • Figure 4: Comparison of time to solution for matrix-free and sparse matrix approaches across different polynomial degrees in 2D and 3D for the compressible neo-Hookean model. Computations using the sparse-matrix approach are shown with dashed lines, while the matrix-free approach is shown with solid lines.
  • Figure 5: Measured throughput during application of matrix--vector operator for the split neo-Hookean model. The processing rate is expressed in DoFs/second. The data is shown for 2D (left) and 3D (right). The results obtained with automatically generated code are depicted with solid lines, while the estimates for the ones obtained with hand-written code schussnig2024matrix are shown with dotted lines. To show the uncertainty, we indicate possible variations of recompute all throughput with the green area.