A versatile FEM framework with native GPU scalability via globally-applied AD
Mohit Pundir, Flavio Lorez, David S. Kammer
TL;DR
This work addresses the long-standing tension between expressive, energy-based finite-element formulations and scalable computation on modern accelerators. It introduces tatva, a monolithic energy-centric FEM framework that differentiates a single global energy functional to obtain residuals and tangents, combining matrix-free Jacobian–vector products with graph-coloring-based sparse differentiation to achieve $\mathcal{O}(N)$ scaling on GPUs. The approach supports a wide range of problems, including multi-point constraints, mixed-dimensional couplings, and data-driven models such as Neural Constitutive Modeling (NCM) and Neural-Operator Element Methods (NOEM), all within a single differentiable pipeline. Performance results show linear scaling up to tens of millions of degrees of freedom with high throughput, significantly outperforming traditional scatter-add assembly in many regimes. The open-source tatva library enables broad adoption for AI-augmented, variational mechanics on contemporary hardware.
Abstract
Energy-based finite-element formulations provide a unified framework for describing complex physical systems in computational mechanics. In these energy-based methods, the governing equations can be obtained directly by considering the derivatives of a single global energy functional. While Automatic Differentiation (AD) can be used to automate the generation of these derivatives, current frameworks face a clear trade-off based primarily on the scale upon which the AD method is applied. Globally applied AD offers high expressivity but cannot currently be scaled to large problems. Locally applied AD scales well through traditional assembly methods, but the variety of physics and couplings that the framework can easily represent is more limited than the global approach. Here, we introduce an energy-centric framework tatva (https://github.com/smec-ethz/tatva) that defines the physics of a problem as a single global functional and applies AD globally to generate residual and tangent operators. By leveraging Jacobian-vector products for matrix-free solvers and coloring-based sparse differentiation for materializing sparse tangent stiffness matrices when needed, our flexible design scales linearly with the problem size on GPUs. We demonstrate that our framework can handle large problems (with millions of degrees of freedom) without memory exhaustion. Additionally, it offers a unified, fully differentiable methodology that can address a wide range of problems, including multi-point constraints, mixed-dimensional coupling, and the incorporation of neural networks, while maintaining high performance and scalability on modern GPU architectures.
