Table of Contents
Fetching ...

On the adaptive deterministic block coordinate descent methods with momentum for solving large linear least-squares problems

Long-Ze Tan, Ming-Yu Deng, Jia-Li Qiu, Xue-Ping Guo

TL;DR

An adaptive deterministic block coordinate descent method with momentum (mADBCD) for efficiently solving large-scale linear least-squares problems is proposed and a novel column selection criterion based on the Euclidean norm of the residual vector of the normal equation is introduced.

Abstract

In this work, we first present an adaptive deterministic block coordinate descent method with momentum (mADBCD) to solve the linear least-squares problem, which is based on Polyak's heavy ball method and a new column selection criterion for a set of block-controlled indices defined by the Euclidean norm of the residual vector of the normal equation. The mADBCD method eliminates the need for pre-partitioning the column indexes of the coefficient matrix, and it also obviates the need to compute the Moore-Penrose pseudoinverse of a column sub-matrix at each iteration. Moreover, we demonstrate the adaptability and flexibility in the automatic selection and updating of the block control index set. When the coefficient matrix has full rank, the theoretical analysis of the mADBCD method indicates that it linearly converges towards the unique solution of the linear least-squares problem. Furthermore, by effectively integrating count sketch technology with the mADBCD method, we also propose a novel count sketch adaptive block coordinate descent method with momentum (CS-mADBCD) for solving highly overdetermined linear least-squares problems and analysis its convergence. Finally, numerical experiments illustrate the advantages of the proposed two methods in terms of both CPU times and iteration counts compared to recent block coordinate descent methods.

On the adaptive deterministic block coordinate descent methods with momentum for solving large linear least-squares problems

TL;DR

An adaptive deterministic block coordinate descent method with momentum (mADBCD) for efficiently solving large-scale linear least-squares problems is proposed and a novel column selection criterion based on the Euclidean norm of the residual vector of the normal equation is introduced.

Abstract

In this work, we first present an adaptive deterministic block coordinate descent method with momentum (mADBCD) to solve the linear least-squares problem, which is based on Polyak's heavy ball method and a new column selection criterion for a set of block-controlled indices defined by the Euclidean norm of the residual vector of the normal equation. The mADBCD method eliminates the need for pre-partitioning the column indexes of the coefficient matrix, and it also obviates the need to compute the Moore-Penrose pseudoinverse of a column sub-matrix at each iteration. Moreover, we demonstrate the adaptability and flexibility in the automatic selection and updating of the block control index set. When the coefficient matrix has full rank, the theoretical analysis of the mADBCD method indicates that it linearly converges towards the unique solution of the linear least-squares problem. Furthermore, by effectively integrating count sketch technology with the mADBCD method, we also propose a novel count sketch adaptive block coordinate descent method with momentum (CS-mADBCD) for solving highly overdetermined linear least-squares problems and analysis its convergence. Finally, numerical experiments illustrate the advantages of the proposed two methods in terms of both CPU times and iteration counts compared to recent block coordinate descent methods.

Paper Structure

This paper contains 10 sections, 6 theorems, 65 equations, 21 figures, 13 tables, 3 algorithms.

Key Result

Lemma 2.1

If $\mathbf{G} \in \mathbb{R}^{m \times n}$ has full column rank, we have $\sigma_{\min}^2(\mathbf{G})\|\mathbf{x}\|_2^2\leq\|\mathbf{G x}\|_2^2 \leq \sigma_{\max}^2(\mathbf{G})\|\mathbf{x}\|_2^2$, $\forall \mathbf{x} \in \mathbb{R}^n$.

Figures (21)

  • Figure 1: Pictures of $\beta$ versus IT (left) and CPU (right) for mADBCD when $\mathbf{A}$=randn$(7500,750)$.
  • Figure 2: Pictures of $\beta$ versus IT (left) and CPU (right) for mADBCD when $\mathbf{A}$=randn$(7500,1500)$.
  • Figure 3: Pictures of $\beta$ versus IT (left) and CPU (right) for mADBCD when $\mathbf{A}$=randn$(6000,3000)$.
  • Figure 4: Pictures of $\beta$ versus IT (left) and CPU (right) for mADBCD when $\mathbf{A}$=randn$(8000,5000)$.
  • Figure 5: Pictures of $\beta$ versus IT (left) and CPU (right) for mADBCD when $\mathbf{A}$=randn$(21000,12500)$.
  • ...and 16 more figures

Theorems & Definitions (18)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Theorem 3.1
  • proof
  • Remark 3.1
  • Remark 3.2
  • Definition 4.1
  • Lemma 4.1
  • ...and 8 more