Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

Rahul Parhi; Michael Unser

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

Rahul Parhi, Michael Unser

TL;DR

The paper develops a function-space framework for neural networks with multivariate nonlinearities by constructing Banach spaces $\mathcal{M}_{\mathrm{L}}^k(\\mathbb{R}^d)$ via the $k$-plane transform and a sparsity-regularized norm. It proves a representer theorem showing that learning solutions in these spaces reduce to sparse, shallow architectures with skip connections, where each atom has a multivariate nonlinearity tied to the operator $\mathrm{L}$ and the $(d-k)$-variate activation profile $\rho_{\mathrm{L}}$. The theory unifies univariate and multivariate nonlinearities (including ReLU, norm activations, and thin-plate/RBF bases) and connects to RKBS and variation-space formalisms, offering a principled explanation for architectural choices like orthogonal weight normalization. Practically, it implies that, under mild assumptions, optimal solutions are expressible with at most $N \le M - \dim \mathcal{P}_{n_{\mathrm{L}}}$ atoms, motivating sparse, dictionary-like neural representations that adapt to low-dimensional subspaces of the data. Overall, the framework provides a mathematically rigorous bridge between harmonic analysis, inverse problems, and neural network design for multivariate nonlinearities.

Abstract

We investigate the function-space optimality (specifically, the Banach-space optimality) of a large class of shallow neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator, the $k$-plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

TL;DR

The paper develops a function-space framework for neural networks with multivariate nonlinearities by constructing Banach spaces

via the

-plane transform and a sparsity-regularized norm. It proves a representer theorem showing that learning solutions in these spaces reduce to sparse, shallow architectures with skip connections, where each atom has a multivariate nonlinearity tied to the operator

and the

-variate activation profile

. The theory unifies univariate and multivariate nonlinearities (including ReLU, norm activations, and thin-plate/RBF bases) and connects to RKBS and variation-space formalisms, offering a principled explanation for architectural choices like orthogonal weight normalization. Practically, it implies that, under mild assumptions, optimal solutions are expressible with at most

atoms, motivating sparse, dictionary-like neural representations that adapt to low-dimensional subspaces of the data. Overall, the framework provides a mathematically rigorous bridge between harmonic analysis, inverse problems, and neural network design for multivariate nonlinearities.

Abstract

-plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.

Paper Structure (20 sections, 11 theorems, 86 equations)

This paper contains 20 sections, 11 theorems, 86 equations.

Introduction
Main Contributions and Road Map
New Neural Network Banach Spaces
Representer Theorems for Neural Networks with Multivariate Nonlinearities
Connections to Prior Work
Mathematical Preliminaries and Notation
The k-Plane Transform
The Case k=0
Polynomial Spaces and Related Projectors
Main Results
Native Spaces
Optimality of Neural Architectures With Multivariate Nonlinearities
Discussion
Observations and Examples of Compatible Neural Architectures
Connections to RKBS Methods and Variation Spaces
...and 5 more sections

Key Result

Proposition 2.1

\newlabel[0 proposition]prop:cont-inv The operator $\mathop{\mathrm{\mathscr{R}}}\nolimits_k$ continuously maps $\mathcal{S}(\mathbb{R}^d)$ into $\mathcal{S}(\Xi_k)$. Moreover, on $\mathcal{S}(\mathbb{R}^d)$, with where $\lvert\mkern1mu\cdot\mkern1mu\rvert$ denotes the surface area. The underlying operators are the $d$-variate Laplacian operator $\Delta$ and the filtering operatorIn computed tom

Theorems & Definitions (22)

Proposition 2.1: see GelfandIntegralGeometryGonzalezRangeKeinertInversionParhikplaneRubinInversionSmithRadiographsSolmonXRay
Proposition 2.2: Parhikplane
Remark 2.3
Proposition 2.4: Parhikplane
Definition 2.5: Parhikplane
Theorem 2.6
Proposition 2.7: see Parhikplane
Remark 2.8
Definition 3.1
Remark 3.2
...and 12 more

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

TL;DR

Abstract

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (22)