On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

Arnulf Jentzen; Timo Kröger

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

Arnulf Jentzen, Timo Kröger

TL;DR

This work establishes a sharp connection between network parameter norms and the Lipschitz norm of shallow ReLU realizations by proving that every parameter vector can be reparameterized to share its realization with a bound on the parameter norm in terms of the Lipschitz norm, specifically with exponents $1/2$ and $1$. The authors develop geometric tools (tessellations of convex polytopes and affine-hyperplane analysis) to construct such reparameterizations and prove the main bound, along with a two-sided equivalence to the Lipschitz norm. They also prove lower bounds showing these exponents are tight, and they demonstrate that similar bounds do not extend to Hölder or Sobolev-Slobodeckij norms, establishing the sharpness of the Lipschitz-norm approach. Collectively, these results illuminate fundamental limits on how parameter-norm-based bounds can control reparameterized networks, with potential implications for optimization and parameter identifiability in shallow architectures.

Abstract

It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work we reveal in the case of shallow ANNs that the converse inequality is also true. More formally, we prove that the norm of the equivalence class of ANN parameter vectors with the same realization function is, up to a multiplicative constant, bounded from above by the sum of powers of the Lipschitz norm of the ANN realization function (with the exponents $ 1/2 $ and $ 1 $). Moreover, we prove that this upper bound only holds when employing the Lipschitz norm but does neither hold for Hölder norms nor for Sobolev-Slobodeckij norms. Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents $ 1/2 $ and $ 1 $ but does not hold for the Lipschitz norm alone.

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

TL;DR

and

. The authors develop geometric tools (tessellations of convex polytopes and affine-hyperplane analysis) to construct such reparameterizations and prove the main bound, along with a two-sided equivalence to the Lipschitz norm. They also prove lower bounds showing these exponents are tight, and they demonstrate that similar bounds do not extend to Hölder or Sobolev-Slobodeckij norms, establishing the sharpness of the Lipschitz-norm approach. Collectively, these results illuminate fundamental limits on how parameter-norm-based bounds can control reparameterized networks, with potential implications for optimization and parameter identifiability in shallow architectures.

Abstract

and

). Moreover, we prove that this upper bound only holds when employing the Lipschitz norm but does neither hold for Hölder norms nor for Sobolev-Slobodeckij norms. Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents

and

but does not hold for the Lipschitz norm alone.

Paper Structure (12 sections, 23 theorems, 1 figure)

This paper contains 12 sections, 23 theorems, 1 figure.

Introduction
Upper bounds for norms of reparameterized artificial neural networks (ANNs) using Lipschitz norms
Properties of tessellations of convex polytopes in compact cubes
Properties of affine hyperplanes in compact cubes
Upper bounds for norms of reparameterized ANNs using Lipschitz norms
Equivalence of norms of reparameterized ANNs and Lipschitz norms
Lower bounds for norms of reparameterized ANNs using Lipschitz norms
Output biases of ANNs with a maximum number of different kinks
Lower bounds for norms of reparameterized ANNs using Lipschitz norms
Lower bounds for norms of reparameterized ANNs using Hölder norms and Sobolev-Slobodeckij norms
Hölder norms and Sobolev-Slobodeckij norms
Lower bounds for norms of reparameterized ANNs using Hölder norms and Sobolev-Slobodeckij norms

Figures (1)

Figure 1: Graphical illustration of the considered shallow architecture in \ref{['thm:positive']} and \ref{['thm:equivalence']} in the special case of an with $d = 3$ neurons on the input layer and $\mathfrak{h} = 5$ neurons on the hidden layer. In this situation, there are $d \mathfrak{h} = 15$ real weight parameters and $\mathfrak{h} = 5$ real bias parameters for the first affine linear transformation from the three-dimensional input layer to the five-dimensional hidden layer, and there are $\mathfrak{h} = 5$ real weight parameters and $1$ real bias parameter for the second affine linear transformation from the five-dimensional hidden layer to the one-dimensional output layer. The total number of parameters of this thus satisfies $\mathfrak{d} = d \mathfrak{h} + 2 \mathfrak{h} + 1 = 26$. We have that for every parameter vector $\theta = ( \theta_1, \ldots, \theta_{\mathfrak{d}} ) \in \mathbb{R}^{\mathfrak{d}} = \mathbb{R}^{26}$ the associated realization function $\mathbb{R}^3 \ni x \mapsto \mathcal{N}^{\theta}(x) \in \mathbb{R}$ maps the three-dimensional input vector $x = ( x_1, x_2, x_3 ) \in \mathbb{R}^3$ to the scalar output $\mathcal{N}^{\theta}(x) = \theta_{\mathfrak{d}} + \sum_{i=1}^{5} \theta_{ d \mathfrak{h} + \mathfrak{h} + i } \max \{ \theta_{ d \mathfrak{h} + i } + \sum_{j=1}^{3} \theta_{ (i-1) d + j } x_j, 0 \} \in \mathbb{R}$.

Theorems & Definitions (28)

Theorem 1.1
Corollary 1.2
Theorem 1.3
Definition 2.1
Definition 2.2
Lemma 2.3
Lemma 2.4
Lemma 2.5
Lemma 2.6
Theorem 2.8
...and 18 more

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

TL;DR

Abstract

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (28)