SUPN: Shallow Universal Polynomial Networks
Zachary Morrow, Michael Penwarden, Brian Chen, Aurya Javeed, Akil Narayan, John D. Jakeman
TL;DR
SUPNs address overparameterization in deep nets by replacing early learned layers with a single-layer polynomial lift, yielding a parsimonious surrogate with provable universal approximation properties. The authors prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree and provide explicit quasi-optimal parameter constructions, complemented by a second-order trust-region Newton–CG training algorithm. An extensive empirical study across 1D, 2D, and 10D problems (over 13,000 models) shows SUPNs achieving lower approximation error and less variability than DNNs and KANs for comparable parameter budgets, and outperforming polynomial projection on non-smooth or tensor-product-structured targets. The work suggests SUPNs as a practical, robust building block for surrogate modeling and points to future directions in operator learning, physics-informed variants, and adaptive, anisotropic index sets to scale to very high dimensions.
Abstract
Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.
