Table of Contents
Fetching ...

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

Rahul Parhi, Michael Unser

TL;DR

The paper develops a function-space framework for neural networks with multivariate nonlinearities by constructing Banach spaces $\mathcal{M}_{\mathrm{L}}^k(\\mathbb{R}^d)$ via the $k$-plane transform and a sparsity-regularized norm. It proves a representer theorem showing that learning solutions in these spaces reduce to sparse, shallow architectures with skip connections, where each atom has a multivariate nonlinearity tied to the operator $\mathrm{L}$ and the $(d-k)$-variate activation profile $\rho_{\mathrm{L}}$. The theory unifies univariate and multivariate nonlinearities (including ReLU, norm activations, and thin-plate/RBF bases) and connects to RKBS and variation-space formalisms, offering a principled explanation for architectural choices like orthogonal weight normalization. Practically, it implies that, under mild assumptions, optimal solutions are expressible with at most $N \le M - \dim \mathcal{P}_{n_{\mathrm{L}}}$ atoms, motivating sparse, dictionary-like neural representations that adapt to low-dimensional subspaces of the data. Overall, the framework provides a mathematically rigorous bridge between harmonic analysis, inverse problems, and neural network design for multivariate nonlinearities.

Abstract

We investigate the function-space optimality (specifically, the Banach-space optimality) of a large class of shallow neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator, the $k$-plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.

Function-Space Optimality of Neural Architectures with Multivariate Nonlinearities

TL;DR

The paper develops a function-space framework for neural networks with multivariate nonlinearities by constructing Banach spaces via the -plane transform and a sparsity-regularized norm. It proves a representer theorem showing that learning solutions in these spaces reduce to sparse, shallow architectures with skip connections, where each atom has a multivariate nonlinearity tied to the operator and the -variate activation profile . The theory unifies univariate and multivariate nonlinearities (including ReLU, norm activations, and thin-plate/RBF bases) and connects to RKBS and variation-space formalisms, offering a principled explanation for architectural choices like orthogonal weight normalization. Practically, it implies that, under mild assumptions, optimal solutions are expressible with at most atoms, motivating sparse, dictionary-like neural representations that adapt to low-dimensional subspaces of the data. Overall, the framework provides a mathematically rigorous bridge between harmonic analysis, inverse problems, and neural network design for multivariate nonlinearities.

Abstract

We investigate the function-space optimality (specifically, the Banach-space optimality) of a large class of shallow neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator, the -plane transform, and a sparsity-promoting norm. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received recent interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
Paper Structure (20 sections, 11 theorems, 86 equations)

This paper contains 20 sections, 11 theorems, 86 equations.

Key Result

Proposition 2.1

\newlabel[0 proposition]prop:cont-inv The operator $\mathop{\mathrm{\mathscr{R}}}\nolimits_k$ continuously maps $\mathcal{S}(\mathbb{R}^d)$ into $\mathcal{S}(\Xi_k)$. Moreover, on $\mathcal{S}(\mathbb{R}^d)$, with where $\lvert\mkern1mu\cdot\mkern1mu\rvert$ denotes the surface area. The underlying operators are the $d$-variate Laplacian operator $\Delta$ and the filtering operatorIn computed tom

Theorems & Definitions (22)

  • Proposition 2.1: see GelfandIntegralGeometryGonzalezRangeKeinertInversionParhikplaneRubinInversionSmithRadiographsSolmonXRay
  • Proposition 2.2: Parhikplane
  • Remark 2.3
  • Proposition 2.4: Parhikplane
  • Definition 2.5: Parhikplane
  • Theorem 2.6
  • Proposition 2.7: see Parhikplane
  • Remark 2.8
  • Definition 3.1
  • Remark 3.2
  • ...and 12 more