Table of Contents
Fetching ...

Preconditioned Proximal Gradient Methods with Conjugate Momentum: A Subspace Perspective

Jian Chen, Xinmin Yang

Abstract

In this paper, we propose a descent method for composite optimization problems with linear operators. Specifically, we first design a structure-exploiting preconditioner tailored to the linear operator so that the resulting preconditioned proximal subproblem admits a closed-form solution through its dual formulation. However, such a structure-driven preconditioner may be poorly aligned with the local curvature of the smooth component, which can lead to slow practical convergence. To address this issue, we develop a subspace proximal Newton framework that incorporates curvature information within a low-dimensional subspace. At each iteration, the search direction is obtained by minimizing a proximal Newton model restricted to a two-dimensional subspace spanned by the current preconditioned proximal gradient direction and a momentum direction derived from the previous iterate. By orthogonalizing the subspace basis with respect to the local Hessian-induced metric, the resulting two-dimensional nonsmooth subproblem can be efficiently approximated by solving two one-dimensional optimization problems. This orthogonalization plays a crucial role: it allows a single pass of alternating one-dimensional updates to provide a good approximation to the original coupled two-dimensional subproblem while keeping the per-iteration computational cost low. We establish global convergence of the proposed method and prove a $Q$-linear convergence rate under strong convexity. Comparative numerical experiments demonstrate the effectiveness of the proposed algorithm, particularly on high-dimensional and ill-conditioned problems.

Preconditioned Proximal Gradient Methods with Conjugate Momentum: A Subspace Perspective

Abstract

In this paper, we propose a descent method for composite optimization problems with linear operators. Specifically, we first design a structure-exploiting preconditioner tailored to the linear operator so that the resulting preconditioned proximal subproblem admits a closed-form solution through its dual formulation. However, such a structure-driven preconditioner may be poorly aligned with the local curvature of the smooth component, which can lead to slow practical convergence. To address this issue, we develop a subspace proximal Newton framework that incorporates curvature information within a low-dimensional subspace. At each iteration, the search direction is obtained by minimizing a proximal Newton model restricted to a two-dimensional subspace spanned by the current preconditioned proximal gradient direction and a momentum direction derived from the previous iterate. By orthogonalizing the subspace basis with respect to the local Hessian-induced metric, the resulting two-dimensional nonsmooth subproblem can be efficiently approximated by solving two one-dimensional optimization problems. This orthogonalization plays a crucial role: it allows a single pass of alternating one-dimensional updates to provide a good approximation to the original coupled two-dimensional subproblem while keeping the per-iteration computational cost low. We establish global convergence of the proposed method and prove a -linear convergence rate under strong convexity. Comparative numerical experiments demonstrate the effectiveness of the proposed algorithm, particularly on high-dimensional and ill-conditioned problems.
Paper Structure (25 sections, 5 theorems, 129 equations, 3 figures, 3 algorithms)

This paper contains 25 sections, 5 theorems, 129 equations, 3 figures, 3 algorithms.

Key Result

lemma 1

Let $h:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ be a proper convex and lower semicontinuous function, which is not necessarily differentiable. Assume that $x^{*}$ is the minimizer of where $P\succ0$. Then

Figures (3)

  • Figure 1: Objective gaps w.r.t. iterations and CPU time for problem (\ref{['lasso']}) with $\lambda=10^{-4}$.
  • Figure 2: Objective gaps w.r.t. iterations and CPU time for problem (\ref{['qp']}).
  • Figure 3: Objective gaps w.r.t. iterations and CPU time for problem (\ref{['tv']}) with $\lambda=1/16$.

Theorems & Definitions (13)

  • remark 1
  • remark 2
  • lemma 1
  • proof
  • proposition 1
  • proof
  • remark 3
  • lemma 2
  • proof
  • theorem 1
  • ...and 3 more