Table of Contents
Fetching ...

On the complexity of proximal gradient and proximal gradient-Newton-CG methods for \(\ell_1\)-regularized Optimization

Hong Zhu

TL;DR

It is demonstrated that the proximal gradient-Newton-CG method achieves the best-known iteration complexity for attaining the proposed weak approximate second-order stationary point, which is consistent with the results for finding an approximate second-order stationary point in unconstrained optimization.

Abstract

In this paper, we propose two second-order methods for solving the \(\ell_1\)-regularized composite optimization problem, which are developed based on two distinct definitions of approximate second-order stationary points. We introduce a hybrid proximal gradient and negative curvature method, as well as an adaptive hybrid proximal gradient-Newton-CG method with negative curvature directions, to find a strong* approximate second-order stationary point and a weak approximate second-order stationary point for \(\ell_1\)-regularized optimization problems, respectively. Comprehensive analyses are provided regarding the iteration complexity, computational complexity, and the local superlinear convergence rates of the first phases of these two methods under specific error bound conditions. We demonstrate that the proximal gradient-Newton-CG method achieves the best-known iteration complexity for attaining the proposed weak approximate second-order stationary point, which is consistent with the results for finding an approximate second-order stationary point in unconstrained optimization. Through a toy example, we show that our proposed methods can effectively escape the first-order approximate solution. Numerical experiments implemented on the \(\ell_1\)-regularized Student's t-regression problem validate the effectiveness of both methods.

On the complexity of proximal gradient and proximal gradient-Newton-CG methods for \(\ell_1\)-regularized Optimization

TL;DR

It is demonstrated that the proximal gradient-Newton-CG method achieves the best-known iteration complexity for attaining the proposed weak approximate second-order stationary point, which is consistent with the results for finding an approximate second-order stationary point in unconstrained optimization.

Abstract

In this paper, we propose two second-order methods for solving the -regularized composite optimization problem, which are developed based on two distinct definitions of approximate second-order stationary points. We introduce a hybrid proximal gradient and negative curvature method, as well as an adaptive hybrid proximal gradient-Newton-CG method with negative curvature directions, to find a strong* approximate second-order stationary point and a weak approximate second-order stationary point for -regularized optimization problems, respectively. Comprehensive analyses are provided regarding the iteration complexity, computational complexity, and the local superlinear convergence rates of the first phases of these two methods under specific error bound conditions. We demonstrate that the proximal gradient-Newton-CG method achieves the best-known iteration complexity for attaining the proposed weak approximate second-order stationary point, which is consistent with the results for finding an approximate second-order stationary point in unconstrained optimization. Through a toy example, we show that our proposed methods can effectively escape the first-order approximate solution. Numerical experiments implemented on the -regularized Student's t-regression problem validate the effectiveness of both methods.

Paper Structure

This paper contains 21 sections, 30 theorems, 194 equations, 4 figures, 4 tables, 4 algorithms.

Key Result

Lemma 2.1

Suppose there exist sequences $\{\varepsilon_g^k\} \downarrow 0$ and $\{\varepsilon_h^k\} \downarrow 0$ and a sequence $\{w^k\} \to w^*$ such that for each $k$, $w^k$ is a strong $(\varepsilon_g^k, \varepsilon_h^k)$-2o point of Problem eq:l1normcom with respective to sequence $\{t_k\}\subseteq [t_{\

Figures (4)

  • Figure 1: Venn diagram illustrating the relationships among the various stationary points.
  • Figure 2: Left: the trial points generated by Algorithm \ref{['alg:hpgnc']}; middle: the trial points generated by Algorithm \ref{['alg:pncg']}; right: the objective function value plotted against iteration.
  • Figure 3: Left: the performance of $\|g(x^k)\|$ and $\|\mathcal{G}_{t_k}(x^k)\|$ across iterations generated by Algorithm \ref{['alg:hpgnc']}; middle: the performance of $\|g(x^k)\|$ and $\|g^{\varepsilon}(x^k)\|$ across iterations generated by Algorithm \ref{['alg:pncg']}; right: the performance of $\|x^k - \bar{x}\|$ plotted against iteration.
  • Figure 4: Left: the performance of $\|g(x)\|$ and $\|\mathcal{G}_{t_k}(x^k)\|$ across iterations generated by Algorithm \ref{['alg:hpgnc']}; middle: the performance of $\|g(x)\|$ and $\|g^{\varepsilon}(x)\|$ across iterations generated by Algorithm \ref{['alg:pncg']}; right: the performance of $\|x^k - \bar{x}\|$ plotted against iteration.

Theorems & Definitions (63)

  • Definition 2.1: strong $(\varepsilon_g, \varepsilon_h)$-2o point
  • Lemma 2.1
  • proof
  • Remark 2.1
  • Definition 2.2: strong* $(\varepsilon_g, \varepsilon_h)$-2o point
  • Lemma 2.2
  • Definition 2.3: weak $\varepsilon_g$-1o point
  • Definition 2.4: weak $(\varepsilon_g, \varepsilon_h)$-2o point
  • Lemma 2.3
  • Remark 2.2
  • ...and 53 more