Table of Contents
Fetching ...

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra

TL;DR

An iterative pre-conditioning technique is proposed that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method and achieves superlinear convergence when the least-squares problem has a unique solution.

Abstract

This paper considers the multi-agent linear least-squares problem in a server-agent network. In this problem, the system comprises multiple agents, each having a set of local data points, that are connected to a server. The goal for the agents is to compute a linear mathematical model that optimally fits the collective data points held by all the agents, without sharing their individual local data points. This goal can be achieved, in principle, using the server-agent variant of the traditional iterative gradient-descent method. The gradient-descent method converges linearly to a solution, and its rate of convergence is lower bounded by the conditioning of the agents' collective data points. If the data points are ill-conditioned, the gradient-descent method may require a large number of iterations to converge. We propose an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method. We rigorously show that the resulting pre-conditioned gradient-descent method, with the proposed iterative pre-conditioning, achieves superlinear convergence when the least-squares problem has a unique solution. In general, the convergence is linear with improved rate of convergence in comparison to the traditional gradient-descent method and the state-of-the-art accelerated gradient-descent methods. We further illustrate the improved rate of convergence of our proposed algorithm through experiments on different real-world least-squares problems in both noise-free and noisy computation environment.

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

TL;DR

An iterative pre-conditioning technique is proposed that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method and achieves superlinear convergence when the least-squares problem has a unique solution.

Abstract

This paper considers the multi-agent linear least-squares problem in a server-agent network. In this problem, the system comprises multiple agents, each having a set of local data points, that are connected to a server. The goal for the agents is to compute a linear mathematical model that optimally fits the collective data points held by all the agents, without sharing their individual local data points. This goal can be achieved, in principle, using the server-agent variant of the traditional iterative gradient-descent method. The gradient-descent method converges linearly to a solution, and its rate of convergence is lower bounded by the conditioning of the agents' collective data points. If the data points are ill-conditioned, the gradient-descent method may require a large number of iterations to converge. We propose an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method. We rigorously show that the resulting pre-conditioned gradient-descent method, with the proposed iterative pre-conditioning, achieves superlinear convergence when the least-squares problem has a unique solution. In general, the convergence is linear with improved rate of convergence in comparison to the traditional gradient-descent method and the state-of-the-art accelerated gradient-descent methods. We further illustrate the improved rate of convergence of our proposed algorithm through experiments on different real-world least-squares problems in both noise-free and noisy computation environment.

Paper Structure

This paper contains 26 sections, 168 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: System architecture.
  • Figure 2: Temporal evolution of error norm for estimate $\left\lVert x(t)-x^*\right\rVert$ under Algorithm \ref{['algo_1']} with different initialization; for the datasets (a) "ash608" and (b) "gr_30_30". (a) $\alpha = 0.1, \, \delta = 1, \, \beta = 0$; (b) $\alpha = 3 \times 10^{-3}, \, \delta = 0.4, \, \beta = 0$.
  • Figure 3: Temporal evolution of error norm for estimate $\left\lVert x(t)-x^*\right\rVert$, under Algorithm \ref{['algo_1']}, GD, APC, NAG, HBM with optimal parameter choices and BFGS. Initialization for (a)-(d): (Algorithm \ref{['algo_1']}) $x(0) = [0,\ldots,0]^T$, $K(0) = O_{d \times d}$; (GD, NAG, HBM) $x(0) = [0,\ldots,0]^T$; (APC) according to the algorithm; (BFGS) $x(0) = [0,\ldots,0]^T$, $M(0) = I$. The algorithms GD, NAG, HBM, and BFGS have been described in Section \ref{['sec:comp']}. The APC algorithm can be found in azizan2019distributed.
  • Figure 4: Temporal evolution of error norm for estimate $\left\lVert x(t)-x^*\right\rVert$ in presence of system noise, under Algorithm \ref{['algo_1']}, GD, APC, NAG, HBM with optimal parameter choices and BFGS; for the datasets (a) "ash608" and (b) "gr_30_30". Initialization for (a) and (b) both: (Algorithm \ref{['algo_1']}) $x(0) = [0,\ldots,0]^T$, $K(0) = O_{d \times d}$; (GD, NAG, HBM) $x(0) = [0,\ldots,0]^T$; (APC) according to the algorithm; (BFGS) $x(0) = [0,\ldots,0]^T$, $M(0) = I$. The algorithms GD, NAG, HBM, and BFGS have been described in Section \ref{['sec:comp']}. The APC algorithm can be found in azizan2019distributed.

Theorems & Definitions (2)

  • proof : Proof of Lemma \ref{['lem:tau']}
  • proof : Proof of Theorem \ref{['thm:noise']}