How do Minimum-Norm Shallow Denoisers Look in Function Space?
Chen Zeno, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, Daniel Soudry
TL;DR
This paper analyzes the functions learned by shallow ReLU denoisers trained to interpolate noisy data with minimal representation cost. It derives a closed-form min-cost denoiser in the univariate case that contracts toward clean samples and generalizes better than eMMSE at low noise, and extends to multivariate data under subspace, ray, and simplex geometries, where the denoiser decomposes into alignments with data edges/faces. The results reveal an alignment phenomenon: min-cost solutions concentrate along rank-one, piecewise-linear components, a structure that holds under several natural data configurations and persists under small perturbations. Empirically, the predicted alignments are observed in synthetic data and real images, and offline/online training yield similar min-cost interpolants in the tested settings. Overall, the work provides a rigorous function-space understanding of how minimal-cost shallow NN denoisers behave, with implications for generalization and the design of denoising components in inverse problems and diffusion-based generation.
Abstract
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we aim to characterize the functions realized by shallow ReLU NN denoisers -- in the common theoretical setting of interpolation (i.e., zero training loss) with a minimal representation cost (i.e., minimal $\ell^2$ norm weights). First, for univariate data, we derive a closed form for the NN denoiser function, find it is contractive toward the clean data points, and prove it generalizes better than the empirical MMSE estimator at a low noise level. Next, for multivariate data, we find the NN denoiser functions in a closed form under various geometric assumptions on the training data: data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.
