Table of Contents
Fetching ...

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

Mateo Díaz, Liwei Jiang, Abdel Ghani Labassi

TL;DR

This paper addresses the slow convergence of subgradient methods for composite optimization when the outer function is well-conditioned but the inner map is ill-conditioned or overparameterized. It introduces a Levenberg–Morrison–Marquardt type subgradient algorithm with a regularized Gauss–Newton preconditioner, and provides two practical parameter configurations that yield linear convergence rates depending only on the outer convex function h. The authors develop general regularity conditions on the parameterization and outer loss, covering both nonsmooth and smooth outer functions, and show that these conditions hold in important problems such as squared-variable formulations, matrix sensing, and CP tensor factorization. They also demonstrate that the theory extends to local regularity regimes and validate the approach with extensive numerical experiments, including robustness to outliers and dimension-independent convergence. Overall, the work offers a practical, theory-backed preconditioned method for fast convergence in a broad class of overparameterized and ill-conditioned composite optimization problems with significant implications for data science and signal processing.

Abstract

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a certain sense, ill-conditioned or overparameterized, the subgradient method exhibits much slower sublinear convergence even when the convex function is well-conditioned. To overcome this limitation, we introduce a Levenberg-Morrison-Marquardt subgradient method that converges linearly under mild regularity conditions at a rate determined solely by the convex function. Further, we demonstrate that these regularity conditions hold for several problems of practical interest, including square-variable formulations, matrix sensing, and tensor factorization. Numerical experiments illustrate the benefits of our method.

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

TL;DR

This paper addresses the slow convergence of subgradient methods for composite optimization when the outer function is well-conditioned but the inner map is ill-conditioned or overparameterized. It introduces a Levenberg–Morrison–Marquardt type subgradient algorithm with a regularized Gauss–Newton preconditioner, and provides two practical parameter configurations that yield linear convergence rates depending only on the outer convex function h. The authors develop general regularity conditions on the parameterization and outer loss, covering both nonsmooth and smooth outer functions, and show that these conditions hold in important problems such as squared-variable formulations, matrix sensing, and CP tensor factorization. They also demonstrate that the theory extends to local regularity regimes and validate the approach with extensive numerical experiments, including robustness to outliers and dimension-independent convergence. Overall, the work offers a practical, theory-backed preconditioned method for fast convergence in a broad class of overparameterized and ill-conditioned composite optimization problems with significant implications for data science and signal processing.

Abstract

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a certain sense, ill-conditioned or overparameterized, the subgradient method exhibits much slower sublinear convergence even when the convex function is well-conditioned. To overcome this limitation, we introduce a Levenberg-Morrison-Marquardt subgradient method that converges linearly under mild regularity conditions at a rate determined solely by the convex function. Further, we demonstrate that these regularity conditions hold for several problems of practical interest, including square-variable formulations, matrix sensing, and tensor factorization. Numerical experiments illustrate the benefits of our method.

Paper Structure

This paper contains 98 sections, 71 theorems, 279 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.1

Let $x_k$ and $x_{k+1}$ be iterates from Algorithm alg:LM. Let $z_{k} = F(x_{k})$ and $z_{k+1} = F(x_{k+1})$. The following three hold true.

Figures (8)

  • Figure 1: Relative distance to the solution against iteration count for Algorithm \ref{['eq:main-update']} applied to an overparameterized nonsmooth matrix factorization problem with $F(U) = UU^{\top}, h(M) = \|M - M^{\star}\|_F, M^{\star} \in {\mathcal{S}}^{50}_{+}, \text{ and }U \in {\mathbf{R}}^{50 \times 3}$ with $\operatorname{rank}(M^{\star}) = 2 < r = 3$ and $\kappa(M^\star) = 1$. All algorithms use the Polyak stepsize.
  • Figure 2: Relative distance against iteration count for nonnegative least squares losses \ref{['nnls-nonsmooth']}.
  • Figure 3: Smooth matrix sensing with the $\ell_2$-norm squared. We use $m =4dr$ ($m=2dr$ for symmetric), with $r^\star = 2$, $r\in\{2,5\}$.
  • Figure 4: Matrix sensing with the $\ell_1$-norm. We use $m =4dr$ ($m=2dr$ for symmetric) with $r^\star = 2$, $r\in \{2,5\}$. OPSAgiampouras2024guarantees only applies to the asymmetric setting.
  • Figure 5: Median number of iterations to achieve convergence ($100$ draws) versus hyperparameter $\gamma$. We declare that a method converged when it reaches a relative error of $10^{-8}$ and cap the maximum number of iterations to $1000.$ The shaded area represents the $5$th and $95$th percentiles, respectively.
  • ...and 3 more figures

Theorems & Definitions (112)

  • Lemma 3.1
  • Lemma 3.2: Aiming towards solution
  • Lemma 3.3: Aiming towards solution
  • Lemma 3.4: Linearization progress
  • proof
  • Theorem 4.1: Convergence under weak alignment and nonsmoothness
  • Proposition 4.2: One-step progress
  • proof : Proof of Proposition \ref{['prop:master_polyak']}
  • Claim 4.3
  • proof : Proof of the Claim \ref{['claim:aligned-projected-subgradients']}
  • ...and 102 more