Table of Contents
Fetching ...

Douglas-Rachford splitting and ADMM for nonconvex optimization: Accelerated and Newton-type linesearch algorithms

Andreas Themelis, Lorenzo Stella, Panagiotis Patrinos

Abstract

Although the performance of popular optimization algorithms such as Douglas-Rachford splitting (DRS) and the ADMM is satisfactory in small and well-scaled problems, ill conditioning and problem size pose a severe obstacle to their reliable employment. Expanding on recent convergence results for DRS and ADMM applied to nonconvex problems, we propose two linesearch algorithms to enhance and robustify these methods by means of quasi-Newton directions. The proposed algorithms are suited for nonconvex problems, require the same black-box oracle of DRS and ADMM, and maintain their (subsequential) convergence properties. Numerical evidence shows that the employment of L-BFGS in the proposed framework greatly improves convergence of DRS and ADMM, making them robust to ill conditioning. Under regularity and nondegeneracy assumptions at the limit point, superlinear convergence is shown when quasi-Newton Broyden directions are adopted.

Douglas-Rachford splitting and ADMM for nonconvex optimization: Accelerated and Newton-type linesearch algorithms

Abstract

Although the performance of popular optimization algorithms such as Douglas-Rachford splitting (DRS) and the ADMM is satisfactory in small and well-scaled problems, ill conditioning and problem size pose a severe obstacle to their reliable employment. Expanding on recent convergence results for DRS and ADMM applied to nonconvex problems, we propose two linesearch algorithms to enhance and robustify these methods by means of quasi-Newton directions. The proposed algorithms are suited for nonconvex problems, require the same black-box oracle of DRS and ADMM, and maintain their (subsequential) convergence properties. Numerical evidence shows that the employment of L-BFGS in the proposed framework greatly improves convergence of DRS and ADMM, making them robust to ill conditioning. Under regularity and nondegeneracy assumptions at the limit point, superlinear convergence is shown when quasi-Newton Broyden directions are adopted.

Paper Structure

This paper contains 30 sections, 86 equations, 5 figures.

Figures (5)

  • Figure 1: Main steps of \ref{['alg:DRS']}. One call to the \ref{['DRS']} oracle at $s$ yields the pair $(u,v)$ and the nominal \ref{['DRS']} update $\bar{s}^+=s+\lambda(v-u)$. On the DRE, this implies a decrease by at least $\tfrac{1}{\gamma}\@C[]\|v-u\|^2$. Since $\varphi_{\gamma}^{\text{\sc dr}}$ is continuous and $c\lneqq\@C[]$, all points close enough to $\bar{s}^+$ belong to the sublevel set $\bigl[\varphi_{\gamma}^{\text{\sc dr}}\leq\varphi_{\gamma}^{\text{\sc dr}}(s)-c\|u-v\|^2\bigr]$ (cyan-shaded region). Therefore, for any direction $d$, all points close to $\bar{s}^+$ in the line segment $[\bar{s}^+,s+d] {}={} \@set{(1-\tau)\bar{s}^++\tau(s+d)}[{\tau\in[0,1]}]$ belong to this set, hence the linesearch is accepted for small enough $\tau$.
  • Figure 2: §\ref{['sec:SLS']}: nonconvex sparse least squares problem \ref{['eq:sparse_least_squares']}. Comparison between \ref{['DRS']} and the linesearch variant \ref{['alg:DRS']} using modified Broyden, L-BFGS, and Nesterov acceleration directions.
  • Figure 3: §\ref{['sec:SPCA']}: sparse PCA problem \ref{['eq:sparse_PCA']} on a subset of the 20newsgroup dataset (100 features only). Comparison between \ref{['DRS']} and the linesearch variant \ref{['alg:DRS']}, using modified Broyden, L-BFGS, and Anderson acceleration directions, when applied to the sparse PCA problem \ref{['eq:sparse_PCA']}. On the $x$-axis, the number of linear systems solved (needed for the $u$-update): in the case of \ref{['DRS']}, this coincides with the number of iterations, while for \ref{['alg:DRS']} it accounts for all operations performed in the linesearch. We used a memory parameter of 5 for L-BFGS and Anderson acceleration.
  • Figure 4: §\ref{['sec:SPCAconsensus']}: consensus sparse PCA problem \ref{['eq:cSPCA']} on full datasets. Comparison between \ref{['ADMM']} (blue) and the L-BFGS enhancement (red), for different number of agents $N=10,20,50$. Left: \ref{['ADMM']} residual; Right: cost. On the $x$-axis, the number of linear systems solved (needed for the $\bm x$-update): in \ref{['ADMM']} this coincides with the number of iterations, while in \ref{['alg:ADMM']} it accounts for each linesearch step. This is the only expensive operation, as the $z$-update is negligible. Apparently, \ref{['ADMM']} is severly affected by $N$, whereas using L-BFGS directions in \ref{['alg:ADMM']} consistently results in faster convergence.
  • Figure 5: §\ref{['sec:MPC']}: linear MPC problem \ref{['eq:linearMPC']} for the AFTI-16 system. Comparison between \ref{['DRS']} and the linesearch variant \ref{['alg:DRS']} using modified Broyden, L-BFGS, and Nesterov acceleration directions, to reach a tolerance of $10^{-5}$ at each time step.

Theorems & Definitions (12)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more