Table of Contents
Fetching ...

Pivotal Auto-Encoder via Self-Normalizing ReLU

Nelson Goldenstein, Jeremias Sulam, Yaniv Romano

TL;DR

The paper addresses the lack of robustness of sparse auto-encoders to unknown test-time noise by formulating a transform-learning objective with a square-root loss, yielding a noise-level invariant encoder. It introduces Self Normalizing ReLU (NeLU), an iterative proximal-gradient solver unrolled into a trainable network that learns $W$ and $\lambda$ end-to-end. The authors prove support recovery and estimation bounds under bounded and Gaussian noise, and demonstrate through synthetic and natural image denoising that the NeLU architecture offers superior stability across unseen noise levels. This approach provides a principled path to noise-robust, transform-based auto-encoders and points toward deeper architectures and CNN extensions for broader applications.

Abstract

Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize single hidden layer sparse auto-encoders as a transform learning problem. Leveraging the transform modeling interpretation, we propose an optimization problem that leads to a predictive model invariant to the noise level at test time. In other words, the same pre-trained model is able to generalize to different noise levels. The proposed optimization algorithm, derived from the square root lasso, is translated into a new, computationally efficient auto-encoding architecture. After proving that our new method is invariant to the noise level, we evaluate our approach by training networks using the proposed architecture for denoising tasks. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise compared to commonly used architectures.

Pivotal Auto-Encoder via Self-Normalizing ReLU

TL;DR

The paper addresses the lack of robustness of sparse auto-encoders to unknown test-time noise by formulating a transform-learning objective with a square-root loss, yielding a noise-level invariant encoder. It introduces Self Normalizing ReLU (NeLU), an iterative proximal-gradient solver unrolled into a trainable network that learns and end-to-end. The authors prove support recovery and estimation bounds under bounded and Gaussian noise, and demonstrate through synthetic and natural image denoising that the NeLU architecture offers superior stability across unseen noise levels. This approach provides a principled path to noise-robust, transform-based auto-encoders and points toward deeper architectures and CNN extensions for broader applications.

Abstract

Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize single hidden layer sparse auto-encoders as a transform learning problem. Leveraging the transform modeling interpretation, we propose an optimization problem that leads to a predictive model invariant to the noise level at test time. In other words, the same pre-trained model is able to generalize to different noise levels. The proposed optimization algorithm, derived from the square root lasso, is translated into a new, computationally efficient auto-encoding architecture. After proving that our new method is invariant to the noise level, we evaluate our approach by training networks using the proposed architecture for denoising tasks. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise compared to commonly used architectures.
Paper Structure (23 sections, 7 theorems, 49 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 23 sections, 7 theorems, 49 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let Assumption ass:bound_noise be satisfied, let $\eta$ satisfy Assumption ass:eta. Then, for and $\widehat{z}$ be the solution to eq:our, we get that Moreover, if then the estimated support recovers the true sparsity pattern $\mathcal{S} = \{j \in [d] : |z_j| > 0 \}$ correctly, i.e.,

Figures (8)

  • Figure 1: The NeLU architecture: A recurrent sparse encoder model, unrolled for a predetermined number of iterations.
  • Figure 2: Experimental results for analytical transform with synthetic data. (a) Mean squared error (MSE) of $\ell_2$ estimation error as a function of the noise level $\sigma$, evaluated in both settings. In the oracle setting, the regularization parameter is tuned to achieve the smallest estimation error. In the theoretical setting, we use $\lambda = \frac{1}{2}\frac{\lVert e \rVert_{\infty}}{\lVert e \rVert_2}$ in Algorithm \ref{['Alg:PGD']}. (b) $\lambda$ values used for each algorithm in the previous graph. Note that the optimal $\lambda$ values for \ref{['Alg:PGD']} are constant, while they are linear for the traditional algorithm. The standard errors are below 0.02 and thus barely visible.
  • Figure 3: Synthetic supervised sparse coding: a comparison of mean squared error (MSE) for estimation error between a two-layer sparse encoder architecture with NeLU and a similar architecture with soft-thresholding, trained on data with a fixed noise level of 0.1. The performance is evaluated at different noise levels, averaged over 2048 realizations of the data.
  • Figure 4: Synthetic sparse signal denoising: a comparison of the mean squared error (MSE) for the reconstruction error, $\widehat{x} - x$. Other details are the same as in Figure \ref{['fig:rec']}.
  • Figure 5: 1D example of the process utilized in the experiment in \ref{['exp:natural']} when $\textit{stride}=2$. Each signal is replicated $\textit{stride}$ times and subsequently translated, yielding slight shifts of the original for each replica. This process effectively transforms a single image into a collection of $\textit{stride}$ variations, each exhibiting a slight spatial offset. The final output is an average of all denoised shifts.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Theorem 1
  • proof : Proof of \ref{['Thm:supp']}
  • Remark 1
  • Theorem 2
  • proof : Proof of \ref{['Thm:sqrt']}
  • Theorem 3
  • proof : Proof of \ref{['Thm:supp_gauss']}
  • Lemma 1
  • proof
  • Remark 2
  • ...and 5 more