Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
Jinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang
TL;DR
This work tackles the prohibitive computational and memory demands of Physics-Informed Neural Networks (PINNs) for PDE solving on resource-limited devices by marrying fully quantized training, Stein's estimator-based residuals, and tensor-train (TT) compression. It introduces three key innovations—Square-block MX-INT (SMX) for efficient mixed-precision quantization, DiffQuant to preserve Stein's estimator accuracy under low-bit arithmetic, and a Partial-Reconstruction Scheme (PRS) for TT-Layers to curb quantization error accumulation—complemented by the PINTA hardware accelerator for precision-scalable execution. Experiments on 2D Poisson, 20D HJB, and 100D Heat equations show accuracy on par with or better than full-precision baselines while delivering up to 83.5x speedups and up to 2324.1x energy savings. The results demonstrate practical on-device PINN training with significant performance and energy benefits, enabling real-time PDE solving on edge devices.
Abstract
Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.
