(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

Rossen Nenov; Daniel Haider; Peter Balazs

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

Rossen Nenov, Daniel Haider, Peter Balazs

TL;DR

A novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers is introduced which can be easily implemented and integrated into existing optimization algorithms.

Abstract

Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This paper introduces a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers. In particular, we derive a formula for the gradient of this regularizer which can be easily implemented and integrated into existing optimization algorithms. We show the advantages of this approach for noisy classification and denoising of MNIST images.

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

TL;DR

Abstract

Paper Structure (23 sections, 12 theorems, 38 equations, 4 figures, 4 tables)

This paper contains 23 sections, 12 theorems, 38 equations, 4 figures, 4 tables.

Introduction
Matrix Regularization
Differential Calculus
Numerical Experiments
Basic Functionality
Noisy MNIST Classification
Denoising MNIST
Conclusion
On the Discontinuity of the Condition Number
Essentials from Subdifferential Calculus
Convex Subdifferential rockafellar
Mordukhovich Subdifferential mordukhovich2018variational
Coincidence of Subdifferentials mordukhovich2018variational
Mordukhovich Subdifferential of the Sum of Functions mordukhovich2018variational
Rules of Differentiation mordukhovich2018variational
...and 8 more sections

Key Result

Theorem 2.1

For any $S\in \mathbb{R}^{n\times m}$ the regularizer $r(S)$ defined in Eq. eq:regularizer is non-negative. If $S\neq 0$, then $r(S) = 0$ if and only if $S$ has full rank and $\kappa(S)=1$.

Figures (4)

Figure 1: Results of MNIST denoising with autoencoders. Top: MNIST images with added Gaussian noise. Mid: No regularization. Bottom: Proposed regularization. While the vanilla autoencoder struggles significantly, the regularized one performs well.
Figure 2: Results of least-squares minimization of \ref{['eq:LQM']} after $10^5$ iterations for different regularization parameter $\lambda$ values
Figure 3: MNIST denoising results with (bottom) and without regularization (mid) with three different SNRs, from left to right: $10,1,0.5$. Already with reconstructing the images from almost no noise, the non-regularized autoencoder struggles. Due to the high condition numbers in the network, the output is very sensitive to perturbations in the input, resulting in the network being unable to learn properly.
Figure 4: Left: Results of MNIST denoising with SNR 1 with Tikhonov regularization for two different sets of parameters. Mid: $\lambda_1=0.01$, $\lambda_2=0.0001$. Bottom: $\lambda_1=1$, $\lambda_2=0.01$. Right: Results of least-squares minimization of \ref{['eq:LQM']} after $10^5$ iterations with Tikhonov Regularizer for different regularization parameter $\lambda$ values

Theorems & Definitions (25)

Theorem 2.1
proof
Theorem 2.2
Theorem 3.1
Theorem 3.2
Remark 3.1
Example 1.1
Definition 2.1: Proper function
Definition 2.2: Convex Subdifferential
Remark 2.3
...and 15 more

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

TL;DR

Abstract

(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (25)