Table of Contents
Fetching ...

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

Arnulf Jentzen, Timo Kröger

TL;DR

This work establishes a sharp connection between network parameter norms and the Lipschitz norm of shallow ReLU realizations by proving that every parameter vector can be reparameterized to share its realization with a bound on the parameter norm in terms of the Lipschitz norm, specifically with exponents $1/2$ and $1$. The authors develop geometric tools (tessellations of convex polytopes and affine-hyperplane analysis) to construct such reparameterizations and prove the main bound, along with a two-sided equivalence to the Lipschitz norm. They also prove lower bounds showing these exponents are tight, and they demonstrate that similar bounds do not extend to Hölder or Sobolev-Slobodeckij norms, establishing the sharpness of the Lipschitz-norm approach. Collectively, these results illuminate fundamental limits on how parameter-norm-based bounds can control reparameterized networks, with potential implications for optimization and parameter identifiability in shallow architectures.

Abstract

It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work we reveal in the case of shallow ANNs that the converse inequality is also true. More formally, we prove that the norm of the equivalence class of ANN parameter vectors with the same realization function is, up to a multiplicative constant, bounded from above by the sum of powers of the Lipschitz norm of the ANN realization function (with the exponents $ 1/2 $ and $ 1 $). Moreover, we prove that this upper bound only holds when employing the Lipschitz norm but does neither hold for Hölder norms nor for Sobolev-Slobodeckij norms. Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents $ 1/2 $ and $ 1 $ but does not hold for the Lipschitz norm alone.

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

TL;DR

This work establishes a sharp connection between network parameter norms and the Lipschitz norm of shallow ReLU realizations by proving that every parameter vector can be reparameterized to share its realization with a bound on the parameter norm in terms of the Lipschitz norm, specifically with exponents and . The authors develop geometric tools (tessellations of convex polytopes and affine-hyperplane analysis) to construct such reparameterizations and prove the main bound, along with a two-sided equivalence to the Lipschitz norm. They also prove lower bounds showing these exponents are tight, and they demonstrate that similar bounds do not extend to Hölder or Sobolev-Slobodeckij norms, establishing the sharpness of the Lipschitz-norm approach. Collectively, these results illuminate fundamental limits on how parameter-norm-based bounds can control reparameterized networks, with potential implications for optimization and parameter identifiability in shallow architectures.

Abstract

It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work we reveal in the case of shallow ANNs that the converse inequality is also true. More formally, we prove that the norm of the equivalence class of ANN parameter vectors with the same realization function is, up to a multiplicative constant, bounded from above by the sum of powers of the Lipschitz norm of the ANN realization function (with the exponents and ). Moreover, we prove that this upper bound only holds when employing the Lipschitz norm but does neither hold for Hölder norms nor for Sobolev-Slobodeckij norms. Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents and but does not hold for the Lipschitz norm alone.
Paper Structure (12 sections, 23 theorems, 1 figure)

This paper contains 12 sections, 23 theorems, 1 figure.

Figures (1)

  • Figure 1: Graphical illustration of the considered shallow architecture in \ref{['thm:positive']} and \ref{['thm:equivalence']} in the special case of an with $d = 3$ neurons on the input layer and $\mathfrak{h} = 5$ neurons on the hidden layer. In this situation, there are $d \mathfrak{h} = 15$ real weight parameters and $\mathfrak{h} = 5$ real bias parameters for the first affine linear transformation from the three-dimensional input layer to the five-dimensional hidden layer, and there are $\mathfrak{h} = 5$ real weight parameters and $1$ real bias parameter for the second affine linear transformation from the five-dimensional hidden layer to the one-dimensional output layer. The total number of parameters of this thus satisfies $\mathfrak{d} = d \mathfrak{h} + 2 \mathfrak{h} + 1 = 26$. We have that for every parameter vector $\theta = ( \theta_1, \ldots, \theta_{\mathfrak{d}} ) \in \mathbb{R}^{\mathfrak{d}} = \mathbb{R}^{26}$ the associated realization function $\mathbb{R}^3 \ni x \mapsto \mathcal{N}^{\theta}(x) \in \mathbb{R}$ maps the three-dimensional input vector $x = ( x_1, x_2, x_3 ) \in \mathbb{R}^3$ to the scalar output $\mathcal{N}^{\theta}(x) = \theta_{\mathfrak{d}} + \sum_{i=1}^{5} \theta_{ d \mathfrak{h} + \mathfrak{h} + i } \max \{ \theta_{ d \mathfrak{h} + i } + \sum_{j=1}^{3} \theta_{ (i-1) d + j } x_j, 0 \} \in \mathbb{R}$.

Theorems & Definitions (28)

  • Theorem 1.1
  • Corollary 1.2
  • Theorem 1.3
  • Definition 2.1
  • Definition 2.2
  • Lemma 2.3
  • Lemma 2.4
  • Lemma 2.5
  • Lemma 2.6
  • Theorem 2.8
  • ...and 18 more