Table of Contents
Fetching ...

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

Jinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang

TL;DR

This work tackles the prohibitive computational and memory demands of Physics-Informed Neural Networks (PINNs) for PDE solving on resource-limited devices by marrying fully quantized training, Stein's estimator-based residuals, and tensor-train (TT) compression. It introduces three key innovations—Square-block MX-INT (SMX) for efficient mixed-precision quantization, DiffQuant to preserve Stein's estimator accuracy under low-bit arithmetic, and a Partial-Reconstruction Scheme (PRS) for TT-Layers to curb quantization error accumulation—complemented by the PINTA hardware accelerator for precision-scalable execution. Experiments on 2D Poisson, 20D HJB, and 100D Heat equations show accuracy on par with or better than full-precision baselines while delivering up to 83.5x speedups and up to 2324.1x energy savings. The results demonstrate practical on-device PINN training with significant performance and energy benefits, enabling real-time PDE solving on edge devices.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

TL;DR

This work tackles the prohibitive computational and memory demands of Physics-Informed Neural Networks (PINNs) for PDE solving on resource-limited devices by marrying fully quantized training, Stein's estimator-based residuals, and tensor-train (TT) compression. It introduces three key innovations—Square-block MX-INT (SMX) for efficient mixed-precision quantization, DiffQuant to preserve Stein's estimator accuracy under low-bit arithmetic, and a Partial-Reconstruction Scheme (PRS) for TT-Layers to curb quantization error accumulation—complemented by the PINTA hardware accelerator for precision-scalable execution. Experiments on 2D Poisson, 20D HJB, and 100D Heat equations show accuracy on par with or better than full-precision baselines while delivering up to 83.5x speedups and up to 2324.1x energy savings. The results demonstrate practical on-device PINN training with significant performance and energy benefits, enabling real-time PDE solving on edge devices.

Abstract

Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

Paper Structure

This paper contains 17 sections, 16 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Training flow of PINN with (a) Automatic Differentiation and (b) Stein's Estimator.
  • Figure 2: Illustration of Tensor-Train Decomposition.
  • Figure 3: Overview of the proposed efficient on-device PINN training framework.
  • Figure 4: Computing flow of fully quantized training with (a) MX format and (b) Square-block MX format.
  • Figure 5: Computing flow of difference-based quantization scheme.
  • ...and 3 more figures