Table of Contents
Fetching ...

Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization

James Kotary, Jacob Christopher, My H Dinh, Ferdinando Fioretto

TL;DR

This work tackles the challenge of differentiating through optimization layers in neural networks by analyzing the backward pass of unrolled optimization and showing its asymptotic equivalence to solving a linear system via a fixed-point iteration. It introduces unfolded optimization, which defers inner differentiations to Jacobian-gradient products, and further develops Folded Optimization, which separates forward and backward passes and solves the backward problem with efficient linear-algebra methods (e.g., LFPI and Krylov methods) using only Jacobian-vector products. The authors provide theoretical results on backward-pass convergence, empirical validations of potential pitfalls in naive unrolling, and a practical open-source library, fold-opt, that enables flexible, efficient, and differentiable optimization layers across nonconvex and convex problems and even blackbox solvers. The framework delivers significant computational gains and modeling flexibility across decision-focused learning tasks, AC-OPF, portfolio optimization, denoising, and multilabel classification, by enabling robust, differentiable end-to-end optimization pipelines. The work thus offers a versatile bridge between differentiable optimization and scalable, task-specific solvers with broad practical impact for end-to-end learning systems.

Abstract

The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the entire chain of operations executed by an iterative optimization solver. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is asymptotically equivalent to the solution of a linear system by a particular iterative method. Several practical pitfalls of unrolling are demonstrated in light of these insights, and a system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations. Experiments over various end-to-end optimization and learning tasks demonstrate the advantages of this system both computationally, and in terms of flexibility over various optimization problem forms.

Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization

TL;DR

This work tackles the challenge of differentiating through optimization layers in neural networks by analyzing the backward pass of unrolled optimization and showing its asymptotic equivalence to solving a linear system via a fixed-point iteration. It introduces unfolded optimization, which defers inner differentiations to Jacobian-gradient products, and further develops Folded Optimization, which separates forward and backward passes and solves the backward problem with efficient linear-algebra methods (e.g., LFPI and Krylov methods) using only Jacobian-vector products. The authors provide theoretical results on backward-pass convergence, empirical validations of potential pitfalls in naive unrolling, and a practical open-source library, fold-opt, that enables flexible, efficient, and differentiable optimization layers across nonconvex and convex problems and even blackbox solvers. The framework delivers significant computational gains and modeling flexibility across decision-focused learning tasks, AC-OPF, portfolio optimization, denoising, and multilabel classification, by enabling robust, differentiable end-to-end optimization pipelines. The work thus offers a versatile bridge between differentiable optimization and scalable, task-specific solvers with broad practical impact for end-to-end learning systems.

Abstract

The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the entire chain of operations executed by an iterative optimization solver. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is asymptotically equivalent to the solution of a linear system by a particular iterative method. Several practical pitfalls of unrolling are demonstrated in light of these insights, and a system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations. Experiments over various end-to-end optimization and learning tasks demonstrate the advantages of this system both computationally, and in terms of flexibility over various optimization problem forms.
Paper Structure (59 sections, 5 theorems, 48 equations, 11 figures)

This paper contains 59 sections, 5 theorems, 48 equations, 11 figures.

Key Result

Lemma 1

Let $\mathbf{B} \in \mathbb{R}^{n \times n}$ and $\mathbf{b} \in \mathbb{R}^{n}$. For any $\mathbf{z}_0 \in \mathbb{R}^n$, the iteration converges to the solution $\mathbf{z}$ of the linear system $\mathbf{z} = \mathbf{B} \mathbf{z} + \mathbf{b}$ whenever $\mathbf{B}$ is nonsingular and has spectral radius $\rho(\mathbf{B}) < 1$. Furthermore, the asymptotic convergence rate for $\mathbf{z}_k \to

Figures (11)

  • Figure 1: Compared to unrolling, unfolding requires fewer operations on the computational graph by replacing inner loops with Jacobian-gradient products. Fixed-point folding models the unfolding analytically, allowing for generic implementations.
  • Figure 2: Unfolding Projected Gradient Descent at $\mathbf{x}^{\star}$ consists of alternating gradient step $\mathcal{S}$ with projection $\mathcal{P}_{\mathbf{C}}$. Section \ref{['sec:Unfolding_at_a_fixed_point']} shows that the resulting chain of JgP operations in backpropagation is equivalent to solving the differential fixed-point conditions \ref{['eq:Lemma_fixedpt']} by linear fixed-point iteration. Each function's forward and backward pass are illustrated in blue and red, respectively.
  • Figure 3: Forward and backward pass error per number of iteratons, across different stepsizes on CIFAR100 Multilabel Classification. Error is measured on average over $100$ samples. Each row represents a distinct differentiable solver implementation; the first two represent unfolded PGD and the latter two represent folded optimization counterparts. Columns correspond to PGD stepsize.
  • Figure 4: An expanded view of Figure \ref{['fig:fwd_bwd_err']}'s third row shows backward-pass convergence for fixed-point folding of PGD by GMRes, compared to stepsize $\alpha$ and spectral radius $\mathbf{\Phi}$ (color scale) on CIFAR100 Multilabel Classification. The main consequences of Theorem \ref{['thm:unfolding_convergence_fixedpt']} are illustrated: convergence rate is maximized when the spectral radius of $\mathbf{\Phi}$ is minimized, and failure to converge coincides with when the spectral radius exceeds $1$.
  • Figure 5: An expanded view of Figure \ref{['fig:fwd_bwd_err']}'s third row shows backward-pass convergence for fixed-point folding of PGD by GMRes, compared to stepsize $\alpha$ and spectral radius $\mathbf{\Phi}$ (color scale) on CIFAR100 Multilabel Classification. Because GMRes does not depend on iterating a contractive mapping with low spectral radius, convergence rates are unaffected by the stepsize of PGD used to backpropagate gradients.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2: Unfolding
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof
  • Corollary 1
  • Corollary 2