Table of Contents
Fetching ...

SUPN: Shallow Universal Polynomial Networks

Zachary Morrow, Michael Penwarden, Brian Chen, Aurya Javeed, Akil Narayan, John D. Jakeman

TL;DR

SUPNs address overparameterization in deep nets by replacing early learned layers with a single-layer polynomial lift, yielding a parsimonious surrogate with provable universal approximation properties. The authors prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree and provide explicit quasi-optimal parameter constructions, complemented by a second-order trust-region Newton–CG training algorithm. An extensive empirical study across 1D, 2D, and 10D problems (over 13,000 models) shows SUPNs achieving lower approximation error and less variability than DNNs and KANs for comparable parameter budgets, and outperforming polynomial projection on non-smooth or tensor-product-structured targets. The work suggests SUPNs as a practical, robust building block for surrogate modeling and points to future directions in operator learning, physics-informed variants, and adaptive, anisotropic index sets to scale to very high dimensions.

Abstract

Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.

SUPN: Shallow Universal Polynomial Networks

TL;DR

SUPNs address overparameterization in deep nets by replacing early learned layers with a single-layer polynomial lift, yielding a parsimonious surrogate with provable universal approximation properties. The authors prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree and provide explicit quasi-optimal parameter constructions, complemented by a second-order trust-region Newton–CG training algorithm. An extensive empirical study across 1D, 2D, and 10D problems (over 13,000 models) shows SUPNs achieving lower approximation error and less variability than DNNs and KANs for comparable parameter budgets, and outperforming polynomial projection on non-smooth or tensor-product-structured targets. The work suggests SUPNs as a practical, robust building block for surrogate modeling and points to future directions in operator learning, physics-informed variants, and adaptive, anisotropic index sets to scale to very high dimensions.

Abstract

Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.

Paper Structure

This paper contains 29 sections, 11 theorems, 48 equations, 13 figures, 1 table.

Key Result

Theorem 2.1

Let $f: \Omega \to \mathbb{R}$ be continuous and $\Omega \subset \mathbb{R}^D$ be compact. For any $\epsilon > 0$, there exists a polynomial $q : \Omega \to \mathbb{R}$ such that $\| f - q \|_{L^\infty} < \epsilon$. \newlabelthm:stone-weierstrass0

Figures (13)

  • Figure 1: Illustration of function approximators considered in this work. Note that MLPs and KANs are shown in their single layer form for comparison, but are most often used in a deep (many-layer) configuration compared to SUPNs, which are always a single layer.
  • Figure 1: Two examples of lower sets, the total-degree space (left) and hyperbolic cross-section (right).
  • Figure 1: Training losses with Adam, BFGS, and a second-order trust-region method.
  • Figure 1: 1D target functions for approximation.
  • Figure 2: Relative $L^{2}$ errors vs. trainable parameters using SUPN, DNN, KAN, and polynomial projection for $f_k$, $k \in [5]$.
  • ...and 8 more figures

Theorems & Definitions (21)

  • Theorem 2.1: Stone--Weierstrass cotter1990stonestone1937applicationsweierstrass1885analytische
  • Theorem 2.2: Jackson's inequality jackson1930theory
  • Definition 2.3: Lower set chkifa2018polynomial
  • Example 2.4
  • Theorem 2.5: Shallow, wide MLPs pinkus1999approximation
  • Theorem 2.6: Deep, narrow MLPs kidger2020universal
  • Theorem 3.1: fixed $M$, variable $N$
  • Proof 3.1
  • Theorem 3.2: fixed $N$, variable $M$
  • Proof 3.2
  • ...and 11 more