Table of Contents
Fetching ...

Nuclear Norm Regularization for Deep Learning

Christopher Scarvelis, Justin Solomon

TL;DR

This paper tackles the challenge of regularizing neural networks to have locally low-rank Jacobians by penalizing the nuclear norm of the Jacobian, which is computationally prohibitive in high dimensions. It shows that for $f=g\circ h$ the non-linear nuclear-norm penalty can be exactly recast as a sum of two squared Frobenius norms, enabling a Jacobian-norm regularizer that avoids costly SVDs; it further replaces Jacobian terms with a denoising-style estimator based on Hutchinson's trace estimator. The authors provide a rigorous equivalence theorem and a practical estimator, demonstrating the approach on ROF denoising, unsupervised denoising on ImageNet, SVS-inspired denoising, and representation learning with a regularized autoencoder. The results indicate that the proposed Jacobian-norm regularization scales to high-dimensional problems and yields competitive or meaningful improvements in denoising quality and latent representations, highlighting its potential for broad adoption in deep learning pipelines.

Abstract

Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions $f = g \circ h$, one may equivalently penalize the average squared Frobenius norm of $Jg$ and $Jh$. We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.

Nuclear Norm Regularization for Deep Learning

TL;DR

This paper tackles the challenge of regularizing neural networks to have locally low-rank Jacobians by penalizing the nuclear norm of the Jacobian, which is computationally prohibitive in high dimensions. It shows that for the non-linear nuclear-norm penalty can be exactly recast as a sum of two squared Frobenius norms, enabling a Jacobian-norm regularizer that avoids costly SVDs; it further replaces Jacobian terms with a denoising-style estimator based on Hutchinson's trace estimator. The authors provide a rigorous equivalence theorem and a practical estimator, demonstrating the approach on ROF denoising, unsupervised denoising on ImageNet, SVS-inspired denoising, and representation learning with a regularized autoencoder. The results indicate that the proposed Jacobian-norm regularization scales to high-dimensional problems and yields competitive or meaningful improvements in denoising quality and latent representations, highlighting its potential for broad adoption in deep learning pipelines.

Abstract

Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions , one may equivalently penalize the average squared Frobenius norm of and . We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.
Paper Structure (42 sections, 2 theorems, 72 equations, 8 figures, 1 table)

This paper contains 42 sections, 2 theorems, 72 equations, 8 figures, 1 table.

Key Result

Theorem 3.1

Let $D(\Omega)$ be a data distribution supported on a compact set $\Omega \subseteq {\mathbb{R}}^n$ with measure $\mu$ that is absolutely continuous with respect to the Lebesgue measure on $\Omega$. Let $\ell \in C^1({\mathbb{R}}^m \times {\mathbb{R}}^n)$ be a continuously differentiable loss functi

Figures (8)

  • Figure 1: Comparison of exact and neural solutions to Problems \ref{['eq:rof-exact']} and \ref{['eq:rof-our-reg']} with $n=2$ and $\eta=0.1$ (first three plots) and $\eta=0.25$ (last three plots). The $x-$ and $y-$ axes represent the inputs to $f_\theta$, and colors denote function values. Solving \ref{['eq:rof-our-reg']} recovers an accurate approximation to the true solution for both values of $\eta$ while requiring no Jacobian nuclear norm computations.
  • Figure 2: Mean absolute error of neural solutions to \ref{['eq:rof-exact']} (blue) and \ref{['eq:rof-our-reg']} (orange). Our regularizer obtains solutions with accuracy comparable to directly penalizing the Jacobian nuclear norm.
  • Figure 3: Log-objective values for \ref{['eq:rof-exact']} (blue) and \ref{['eq:rof-our-reg']} (orange) across training iterations. As predicted by Theorem \ref{['thm:big-theorem']}, both problems converge to nearly identical objective values.
  • Figure 4: Denoiser performance comparison on held-out image corrupted by Gaussian noise with $\sigma=1$ (first row) and $\sigma=2$ (second row). Our method performs nearly as well as a supervised denoiser, despite being trained exclusively on highly corrupted data.
  • Figure 5: Jacobian singular values of supervised denoiser (blue) and our denoiser (orange) evaluated at a noisy held-out image with $\sigma=2$.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • Theorem 3.2