COMMET: orders-of-magnitude speed-up in finite element method via batch-vectorized neural constitutive updates
Benjamin Alheit, Mathias Peirlinck, Siddhant Kumar
TL;DR
This work targets the computational bottleneck of neural constitutive models in finite element simulations by introducing COMMET, a framework that combines batch-vectorized assembly, compute-graph optimization (CGO) for exact analytical derivatives, and MPI-based parallelism. The proposed approach reorganizes the standard element-wise assembly into batched operations across many quadrature points, enabling SIMD-style acceleration and reduced memory footprints. CGO replaces expensive automatic differentiation with modular, forward-mode derivative calculations, delivering substantial runtime and memory savings. Across material-point tests, FE benchmarks, and a patient-specific heart inflation example, COMMET achieves up to three orders of magnitude speed-ups in constitutive updates and more than two orders in overall simulation time, with strong MPI scaling to thousands of cores. These results establish a practical pathway to deploy high-fidelity NCMs in large-scale computational mechanics and beyond, under an open-source framework that encourages broad adoption and extension.
Abstract
Constitutive evaluations often dominate the computational cost of finite element (FE) simulations whenever material models are complex. Neural constitutive models (NCMs) offer a highly expressive and flexible framework for modeling complex material behavior in solid mechanics. However, their practical adoption in large-scale FE simulations remains limited due to significant computational costs, especially in repeatedly evaluating stress and stiffness. NCMs thus represent an extreme case: their large computational graphs make stress and stiffness evaluations prohibitively expensive, restricting their use to small-scale problems. In this work, we introduce COMMET, an open-source FE framework whose architecture has been redesigned from the ground up to accelerate high-cost constitutive updates. Our framework features a novel assembly algorithm that supports batched and vectorized constitutive evaluations, compute-graph-optimized derivatives that replace automatic differentiation, and distributed-memory parallelism via MPI. These advances dramatically reduce runtime, with speed-ups exceeding three orders of magnitude relative to traditional non-vectorized automatic differentiation-based implementations. While we demonstrate these gains primarily for NCMs, the same principles apply broadly wherever for-loop based assembly or constitutive updates limit performance, establishing a new standard for large-scale, high-fidelity simulations in computational mechanics.
