How do Minimum-Norm Shallow Denoisers Look in Function Space?

Chen Zeno; Greg Ongie; Yaniv Blumenfeld; Nir Weinberger; Daniel Soudry

How do Minimum-Norm Shallow Denoisers Look in Function Space?

Chen Zeno, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, Daniel Soudry

TL;DR

This paper analyzes the functions learned by shallow ReLU denoisers trained to interpolate noisy data with minimal representation cost. It derives a closed-form min-cost denoiser in the univariate case that contracts toward clean samples and generalizes better than eMMSE at low noise, and extends to multivariate data under subspace, ray, and simplex geometries, where the denoiser decomposes into alignments with data edges/faces. The results reveal an alignment phenomenon: min-cost solutions concentrate along rank-one, piecewise-linear components, a structure that holds under several natural data configurations and persists under small perturbations. Empirically, the predicted alignments are observed in synthetic data and real images, and offline/online training yield similar min-cost interpolants in the tested settings. Overall, the work provides a rigorous function-space understanding of how minimal-cost shallow NN denoisers behave, with implications for generalization and the design of denoising components in inverse problems and diffusion-based generation.

Abstract

Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we aim to characterize the functions realized by shallow ReLU NN denoisers -- in the common theoretical setting of interpolation (i.e., zero training loss) with a minimal representation cost (i.e., minimal $\ell^2$ norm weights). First, for univariate data, we derive a closed form for the NN denoiser function, find it is contractive toward the clean data points, and prove it generalizes better than the empirical MMSE estimator at a low noise level. Next, for multivariate data, we find the NN denoiser functions in a closed form under various geometric assumptions on the training data: data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.

How do Minimum-Norm Shallow Denoisers Look in Function Space?

TL;DR

Abstract

norm weights). First, for univariate data, we derive a closed form for the NN denoiser function, find it is contractive toward the clean data points, and prove it generalizes better than the empirical MMSE estimator at a low noise level. Next, for multivariate data, we find the NN denoiser functions in a closed form under various geometric assumptions on the training data: data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.

Paper Structure (35 sections, 18 theorems, 133 equations, 15 figures, 1 table)

This paper contains 35 sections, 18 theorems, 133 equations, 15 figures, 1 table.

Introduction
Preliminaries and problem setting
The denoising problem.
Denoiser model and algorithms.
Low noise regime.
Basic properties of neural network denoisers
Offline v.s. online NN solutions.
The empirical MMSE denoiser.
Regularization biases toward specific neural network denoisers.
Closed form solution for the NN denoiser function --- univariate data
Minimal norm leads to contractive solutions on univariate data.
Minimal norm leads to alignment phenomenon on multivariate data
Training data on a subspace
Training data on rays
Special case: training data forming a simplex
...and 20 more sections

Key Result

Proposition 1

For all datasets such that Assumption assumption:seperate_datapoints holds, the unique minimizer of $R(f)$ is

Figures (15)

Figure 1: NN denoiser vs eMMSE denoiser. We trained a one-hidden-layer ReLU network with a skip connection on a denoising task. The clean dataset has four points equally spaced in the interval $[-5,5]$, and the noisy samples are generated by adding zero-mean Gaussian noise with $\sigma = 1.5$. We use $\lambda=10^{-5}$ in both setting. The Figure shows the denoiser output as a function of its input for: (1) NN denoiser trained online using \ref{['eq:pratical_update_rule']} for $100K$ iterations, (2) NN denoiser trained offline using \ref{['eq:offline loss']} with $M=9000$ and $20K$ epochs, and (3) the eMMSE denoiser \ref{['eq:eMMSE']}.
Figure 2: Predicted (top row) and empirical (bottom row) min-cost NN denoisers for $N=3$ clean training samples in $d=2$ dimensions. The empricial NN denoisers were trained with weight decay parameter $\lambda=10^{-5}$ and $M=100$ noisy samples. As predicted by our theory, the ReLU boundaries align either perpendicular to the triangle edges in the obtuse case (left panel), or parallel to the triangle edges (right panel).
Figure 3: Numerical evaluation of \ref{['eq:expectation of relu']} Histogram of the sample average of $\mathrm{ReLU}(x)$ for 10000 Monte-Carlo samples. We denote by $E$ the analytical expectation and by $\bar{E}$ the sample average. Figure (a) is for $\mu=1, \sigma = 5$, the normalized error is $\frac{|E-\bar{E}|}{E} = 0.0032\%$. Figure (b) is for $\mu=-1, \sigma = 5$ , the normalized error is $\frac{|E-\bar{E}|}{E} = 0.0059\%$.
Figure 4: Numerical evaluation of \ref{['eq:covariance of relu']} Histogram of the sample average of $\mathrm{ReLU}(z_1)\mathrm{ReLU}(z_2)$ for 10000 Monte-Carlo samples. We denote by $E$ the analytical expectation and by $\bar{E}$ the sample average. Figure (a) is for $\boldsymbol{\mu}= -417 , \boldsymbol{\Sigma} = 13-9-98$, the normalized error is $\frac{|E-\bar{E}|}{E} = 0.0093\%$. Figure (b) is for $\boldsymbol{\mu}= 62 , \boldsymbol{\Sigma} = 10221$, the normalized error is $\frac{|E-\bar{E}|}{E} = 0.0044\%$.
Figure 5: Illustration of $f_{1D}^{*}(y)$.
...and 10 more figures

Theorems & Definitions (40)

Definition 1
Proposition 1
Theorem 1
Definition 2
Lemma 1
Theorem 2
Corollary 1
Theorem 3
Proposition 2
Conjecture 1
...and 30 more

How do Minimum-Norm Shallow Denoisers Look in Function Space?

TL;DR

Abstract

How do Minimum-Norm Shallow Denoisers Look in Function Space?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (40)