Table of Contents
Fetching ...

Robust Stochastically-Descending Unrolled Networks

Samar Hadou, Navid NaderiAlizadeh, Alejandro Ribeiro

TL;DR

The paper tackles the instability and lack of generalization in deep unrolling by introducing descending constraints that enforce per-layer descent toward the optimum in expectation. By formulating a constrained bi-level learning problem and solving it with a primal-dual approach, the authors prove convergence guarantees and exponential convergence rates, along with robustness to out-of-distribution shifts. They validate the method on LISTA for sparse coding and GLOW-Prox for image inpainting, demonstrating improved intermediate-layer behavior, resilience to perturbations, and OOD robustness without sacrificing final performance. The results suggest that training unrolled networks to follow stochastic descent paths yields more reliable, interpretable, and robust models suitable for safety-critical applications.

Abstract

Deep unrolling, or unfolding, is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. However, the convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. To tackle these problems, we provide deep unrolled architectures with a stochastic descent nature by imposing descending constraints during training. The descending constraints are forced layer by layer to ensure that each unrolled layer takes, on average, a descent step toward the optimum during training. We theoretically prove that the sequence constructed by the outputs of the unrolled layers is then guaranteed to converge for unseen problems, assuming no distribution shift between training and test problems. We also show that standard unrolling is brittle to perturbations, and our imposed constraints provide the unrolled networks with robustness to additive noise and perturbations. We numerically assess unrolled architectures trained under the proposed constraints in two different applications, including the sparse coding using learnable iterative shrinkage and thresholding algorithm (LISTA) and image inpainting using proximal generative flow (GLOW-Prox), and demonstrate the performance and robustness benefits of the proposed method.

Robust Stochastically-Descending Unrolled Networks

TL;DR

The paper tackles the instability and lack of generalization in deep unrolling by introducing descending constraints that enforce per-layer descent toward the optimum in expectation. By formulating a constrained bi-level learning problem and solving it with a primal-dual approach, the authors prove convergence guarantees and exponential convergence rates, along with robustness to out-of-distribution shifts. They validate the method on LISTA for sparse coding and GLOW-Prox for image inpainting, demonstrating improved intermediate-layer behavior, resilience to perturbations, and OOD robustness without sacrificing final performance. The results suggest that training unrolled networks to follow stochastic descent paths yields more reliable, interpretable, and robust models suitable for safety-critical applications.

Abstract

Deep unrolling, or unfolding, is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. However, the convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. To tackle these problems, we provide deep unrolled architectures with a stochastic descent nature by imposing descending constraints during training. The descending constraints are forced layer by layer to ensure that each unrolled layer takes, on average, a descent step toward the optimum during training. We theoretically prove that the sequence constructed by the outputs of the unrolled layers is then guaranteed to converge for unseen problems, assuming no distribution shift between training and test problems. We also show that standard unrolling is brittle to perturbations, and our imposed constraints provide the unrolled networks with robustness to additive noise and perturbations. We numerically assess unrolled architectures trained under the proposed constraints in two different applications, including the sparse coding using learnable iterative shrinkage and thresholding algorithm (LISTA) and image inpainting using proximal generative flow (GLOW-Prox), and demonstrate the performance and robustness benefits of the proposed method.
Paper Structure (21 sections, 56 equations, 9 figures, 1 algorithm)

This paper contains 21 sections, 56 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Trajectories of a test example made by (left) gradient descent, (middle) a standard unrolled optimizer, and (right) a constrained unrolled optimizer (ours). The three trajectories were initialized with the same value. The colored contours represent the values of the objective function that is being minimized (least squares).
  • Figure 2: Trajectories of the same test example initialized at the same point but perturbed at the third step with additive noise. The center of the colored contours represents the optimal point, and the final estimate of the optimizers is depicted in green. The standard unrolled optimizer (middle) fails to reach the optimum under additive perturbations, while the constrained one (right), as well as gradient descent (left), succeeds in doing so.
  • Figure 3: Distance to the optimal solution ${\bf y}^*$ and the value of the objective function $f_{sp}({\bf y}_l; {\bf x})$ across the ten unrolled layers of constrained LISTA (blue), LISTA (red), incremental training (black), and layer-per-layer training (black dots). Constrained LISTA makes gradual progressions toward the optimum, unlike the other approaches, due to the implemented descending constraints during training.
  • Figure 4: Histogram of the $\ell_1$-norm of the unrolled layers' outputs at (from top left to bottom right) the input, $1^{st}$, $3^{th}$, $5^{th}$, $7^{th}$, $9^{th}$ and $10^{th}$ layers along with the ground truth ${\bf y}^*$. The histogram stays the same across all the intermediate layers in the case of LISTA, while it moves to the left---representing lower values of $\ell_1$-norm---under the descending constraints.
  • Figure 5: OOD robustness against data shifts of the form $\tilde{\bf x} \sim {\cal N}({\bf x}, p^2{\bf I})$. Distance to the optimal solution $\tilde{\bf y}^*$ and the value of the objective function $f_{sp}({\bf y}_l)$ across the ten unrolled layers of constrained LISTA (blue), LISTA (red), LISTA trained with noisy inputs (gray), and other benchmarks (black).
  • ...and 4 more figures

Theorems & Definitions (6)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof