Table of Contents
Fetching ...

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Sheheryar Mehmood, Peter Ochs

TL;DR

This paper applies unrolled or automatic differentiation to a time-varying iterative process and provides convergence (rate) guarantees for the resulting derivative iterates and adapts these results to proximal gradient descent with variable step size and FISTA when solving partly smooth problems.

Abstract

Numerous Optimization Algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. We confirm our findings numerically by solving $\ell_1$ and $\ell_2$-regularized linear and logisitc regression respectively. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

TL;DR

This paper applies unrolled or automatic differentiation to a time-varying iterative process and provides convergence (rate) guarantees for the resulting derivative iterates and adapts these results to proximal gradient descent with variable step size and FISTA when solving partly smooth problems.

Abstract

Numerous Optimization Algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. We confirm our findings numerically by solving and -regularized linear and logisitc regression respectively. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.

Paper Structure

This paper contains 38 sections, 22 theorems, 40 equations, 1 figure, 1 algorithm.

Key Result

Lemma 2

Let $(\bm x^*, \bm u^*, X_*) \in \mathcal{X}\times\mathcal{U}\times\mathcal{L} (\mathcal{X}, \mathcal{X})$, $V \in \mathcal{N}_{(\bm x^*, \bm u^*)}$$(\mathcal{A}_{k})_{k\in\mathbb N_0}$, and $\Vert \cdot \Vert_{}$ be such that Assumption ( -- ) is satisfied. Then there exists $K\in\mathbb N$ such that $\forall k\geq K$, there exists $U_k\in\mathcal{N}_{\bm u^*}$ and a $C^1$-smooth $\psi_k\colon U

Figures (1)

  • Figure 1: Error plots of iterates (left column) of PGD and APG for Logistic (top row) and Lasso (bottom row) Regression along with their derivative iterates (right column). Similarity in the convergence rates of the original and the derivative iterates is clearly visible.

Theorems & Definitions (53)

  • Remark 1
  • Lemma 2
  • proof
  • Remark 3
  • Remark 4
  • Theorem 5
  • proof
  • Remark 6
  • Theorem 7
  • proof
  • ...and 43 more