Table of Contents
Fetching ...

Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation

Madison Cooley, Shandian Zhe, Robert M. Kirby, Varun Shankar

TL;DR

Polynomial-Augmented Neural Networks (PANNs) fuse a DNN with a trainable polynomial layer to combine the strengths of neural and polynomial approximations. By enforcing weak orthogonality between the two bases, applying polynomial preconditioning, and using basis pruning, PANNs achieve superior accuracy for smooth and finitely smooth functions and improved PDE solutions when used as PI-PANNs. The framework introduces eight discrete orthogonality constraints, a preconditioning strategy, and a suite of algorithmic techniques (precomputation, custom autograd, and truncation) implemented in PyTorch C++. Across extensive experiments, PANNs outperform pure DNNs and plain polynomial methods, with PI-PANNs delivering orders-of-magnitude improvements over traditional PINNs, and provide robust performance in high-dimensional and noisy settings.

Abstract

We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.

Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation

TL;DR

Polynomial-Augmented Neural Networks (PANNs) fuse a DNN with a trainable polynomial layer to combine the strengths of neural and polynomial approximations. By enforcing weak orthogonality between the two bases, applying polynomial preconditioning, and using basis pruning, PANNs achieve superior accuracy for smooth and finitely smooth functions and improved PDE solutions when used as PI-PANNs. The framework introduces eight discrete orthogonality constraints, a preconditioning strategy, and a suite of algorithmic techniques (precomputation, custom autograd, and truncation) implemented in PyTorch C++. Across extensive experiments, PANNs outperform pure DNNs and plain polynomial methods, with PI-PANNs delivering orders-of-magnitude improvements over traditional PINNs, and provide robust performance in high-dimensional and noisy settings.

Abstract

We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness.
Paper Structure (27 sections, 16 equations, 8 figures, 13 tables, 2 algorithms)

This paper contains 27 sections, 16 equations, 8 figures, 13 tables, 2 algorithms.

Figures (8)

  • Figure 1: Visual depiction of bases sets using different generation techniques on the right and the total number of basis function each method produces for increasing problem dimension.
  • Figure 1: Both figures show the proposed neural network architecture with polynomial layer (PANN). The left figure demonstrates the architecture from the adaptive basis viewpoint where each $\psi_i$ and $a_i$ for $i=1,...w$ are the DNN bases and coefficients, while $\phi_j$ and $b_j$ for $j=1,...m$ are the polynomial layer bases and coefficients respectively. $u_\theta$ is the model output which is a linear combination of the DNN and polynomial layer bases and coefficients. Alternatively, the right figure demonstrates the architecture as a residual block with transformed skip connections such that each $H_k$ for $k=1,..,L$ represent the hidden layers of the DNN, $\sigma$ are non-linear activations, $P$ is the polynomial layer and $c_j$ for $j=1,..,m$ are the transformed and adaptive skip connections. Unlike traditional residual blocks aimed at resolving the vanishing gradient issue, our residual interpretation focuses on augmenting the DNN’s output with additional polynomial-based transformations, enriching the function approximation.
  • Figure 1: (Left) Relative $\ell_2$ errors and (right) wall clock time in seconds for different network types using the Tanh (top), RePU (middle), and ReLU (bottom) activation function. PL and $L^2$ projection results are repeated in each figure for easy comparison.
  • Figure 2: Loss landscapes of a physics-informed PANN (left) and standard PINN (right) on a 2D Poisson problem.
  • Figure 2: (Left) Barplot showing the relative $\ell_2$ errors for each orthogonality constraint and activation, compared to using no constraint (None) and using standard $L_1$ regularization. The (right) barplot shows the wall clock training times of the associated method in seconds.
  • ...and 3 more figures