Table of Contents
Fetching ...

Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

Sheheryar Mehmood, Peter Ochs

TL;DR

Under partial smoothness and other mild assumptions, Implicit (ID) and Automatic Differentiation (AD) are applied to the fixed-point iterations of proximal splitting algorithms and it is shown that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping.

Abstract

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning problems. Under partial smoothness and other mild assumptions, we apply Implicit (ID) and Automatic Differentiation (AD) to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical image denoising problems by learning the regularization term.

Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

TL;DR

Under partial smoothness and other mild assumptions, Implicit (ID) and Automatic Differentiation (AD) are applied to the fixed-point iterations of proximal splitting algorithms and it is shown that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping.

Abstract

A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning problems. Under partial smoothness and other mild assumptions, we apply Implicit (ID) and Automatic Differentiation (AD) to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical image denoising problems by learning the regularization term.
Paper Structure (61 sections, 24 theorems, 78 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 61 sections, 24 theorems, 78 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

Let $g\colon \mathcal{X}\times\mathcal{U} \to \overline{\mathbb R}$ be proper and lower semi-continuous, and $(\bm x, \bm u)\in\mathcal{X}\times\mathcal{U}$ be a point. If $g$ is regular at $(\bm x, \bm u)$ and $g(\bm x, \cdot)$ is differentiable at $\bm u$, then we have:

Figures (8)

  • Figure 1: Depiction of Unrolling of \ref{['itr:FPI']}. $\bm x ^{(0)}$ need not be independent of $\bm u$.
  • Figure 2: Error Plots for various sequences obtained from \ref{['prob:lasso']} (left) and \ref{['prob:group:lasso']} (right). The solid lines represent PGD and its derivative sequences while the dotted lines represent APG sequences. The lines without any markers correspond to the original iterates. The vertical lines mark the iteration at which the sequences enter the manifold (solid for PGD and dashed for APG). The lines with circular markers represent forward mode sequences while those with square markers represent Reverse Mode sequences. Markers for FPAD sequences are larger than their AD counterparts. The FPAD sequences are more stable and show faster convergence to the true derivative.
  • Figure 3: Computational Graph of $h(\bm{x})$.
  • Figure 4: Depiction of Forward Mode AD of $h$.
  • Figure 5: Depiction of Reverse Mode AD of $h$.
  • ...and 3 more figures

Theorems & Definitions (89)

  • Lemma 1
  • proof
  • Lemma 2
  • Lemma 3
  • proof
  • Theorem 4: Implicit Function Theorem
  • Remark 5
  • Theorem 6: Local Linear Convergence of Iterates
  • Remark 7
  • Theorem 8
  • ...and 79 more