Table of Contents
Fetching ...

Convergence for nonconvex ADMM, with applications to CT imaging

Rina Foygel Barber, Emil Y. Sidky

TL;DR

The new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed.

Abstract

The alternating direction method of multipliers (ADMM) algorithm is a powerful and flexible tool for complex optimization problems of the form $\min\{f(x)+g(y) : Ax+By=c\}$. ADMM exhibits robust empirical performance across a range of challenging settings including nonsmoothness and nonconvexity of the objective functions $f$ and $g$, and provides a simple and natural approach to the inverse problem of image reconstruction for computed tomography (CT) imaging. From the theoretical point of view, existing results for convergence in the nonconvex setting generally assume smoothness in at least one of the component functions in the objective. In this work, our new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed. We validate these theoretical results empirically, with a simulated example where both $f$ and $g$ are nondifferentiable -- and thus outside the scope of existing theory -- as well as a simulated CT image reconstruction problem.

Convergence for nonconvex ADMM, with applications to CT imaging

TL;DR

The new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed.

Abstract

The alternating direction method of multipliers (ADMM) algorithm is a powerful and flexible tool for complex optimization problems of the form . ADMM exhibits robust empirical performance across a range of challenging settings including nonsmoothness and nonconvexity of the objective functions and , and provides a simple and natural approach to the inverse problem of image reconstruction for computed tomography (CT) imaging. From the theoretical point of view, existing results for convergence in the nonconvex setting generally assume smoothness in at least one of the component functions in the objective. In this work, our new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed. We validate these theoretical results empirically, with a simulated example where both and are nondifferentiable -- and thus outside the scope of existing theory -- as well as a simulated CT image reconstruction problem.

Paper Structure

This paper contains 47 sections, 3 theorems, 150 equations, 8 figures, 1 algorithm.

Key Result

Theorem 1

Suppose that the point $(\tilde{x},\tilde{y})$ is feasible, satisfies Assumption asm:rsc (restricted strong convexity), and satisfies Assumption asm:approx_firstorder (approximate first-order optimality) for some $\tilde{u}\in\mathbb{R}^k$. Suppose that the nonconvex ADMM algorithm given in Algorith where $x_t,y_t$ are the iterates of the nonconvex ADMM algorithm. Then for all $T\geq 1$,

Figures (8)

  • Figure 1: Illustration of the subdifferential $\partial f(t)$, for the function $f(t) = \log(1+|t|)$. For any $t\neq 0$, the function is differentiable at $t$, and the subdifferential is a singleton set containing only this derivative, $\partial f(t) = \{f'(t)\} = \{ \textnormal{sign}(t)/(1+|t|)\}$. This is illustrated in the figure for two nonzero values of $t$. At $t=0$, the function is nondifferentiable, and the subdifferential is given by $\partial f(0) = [-1,1]$. This is illustrated in the figure by showing several elements of $\partial f(0)$.
  • Figure 2: Illustration of the nonconvex sparsity-promoting penalty $\sum_j \beta\log(1+|x_j|/\beta)$ that appears in the objective function \ref{['eqn:sparseQR']} for the sparse high-dimensional quantile regression example. The figure plots the function $t\mapsto \beta\log(1+|t|/\beta)$, for a range of values of $\beta$. The functions are all nondifferentiable at $t=0$, and are similar to the absolute value function for $t\approx 0$, but smaller values of $\beta$ correspond to greater nonconvexity as $|t|$ increases.
  • Figure 3: Results for the sparse quantile regression example (see Section \ref{['sec:examples_empirical']}). The figure shows the value of the objective function \ref{['eqn:sparseQR']} over iteration $t=1,\dots,500$ of the algorithm, run with various values of the parameter $\sigma$ as shown. The top row shows the loss function value for $x_t$ (the estimate at time $t$), as well as its root-mean-square-error (RMSE) $\frac{1}{\sqrt{d}}\|x_t - \tilde{x}\|_2$, while the bottom plot shows the loss and the RMSE for $\bar{x}_t$ (the running average). All axes are on the log scale.
  • Figure 4: Left: schematic of the projection operator. Here $x_{k m}$ is the amount of material $m$ present at pixel $k$, while $y_{\ell m} = (Px)_{\ell m}$ is the total amount of material $m$ present along ray $\ell$ of the scan. Right: attenuation curves for several common materials.
  • Figure 5: Left: the X-ray beam spectrum. This figure displays the density of the distribution of energies in the beam, i.e., how the total intensity of the beam is split across the energy spectrum. Right: for each energy window $w$, the displayed curve is proportional to the spectral response parameters $S_{w\ell i}$. These values are set to be constant across all rays $\ell$, and so the figure plots the value across all energy levels $i$ for each detector window $w$, rescaled so that the sum of the three response curves is equal to the density plot of the X-ray beam spectrum on the left.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Proposition 1
  • Lemma 1