Table of Contents
Fetching ...

High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent

Qicong Hu, Hao Wu

TL;DR

This work tackles the optimization of low-rank representations for high-dimensional sparse data by introducing Accelerated Asynchronous Parallel SGD (A2PSGD). The method combines a lock-free asynchronous scheduler, greedy load balancing, and Nesterov accelerated gradient to achieve faster convergence and higher accuracy on large-scale LR tasks. Empirical results on Movielens 1M and Epinion 665K show that A2PSGD outperforms Hogwild!, DSGD, ASGD, and FPSGD in both prediction quality (lower RMSE/MAE) and training time. The approach offers practical advantages for scalable LR learning in systems with massive sparse interaction matrices, with potential extension to time-varying data and higher-order tensor representations.

Abstract

Data characterized by high dimensionality and sparsity are commonly used to describe real-world node interactions. Low-rank representation (LR) can map high-dimensional sparse (HDS) data to low-dimensional feature spaces and infer node interactions via modeling data latent associations. Unfortunately, existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets. To address this issue, this paper proposes an Accelerated Asynchronous Parallel Stochastic Gradient Descent A2PSGD for High-Dimensional Sparse Data Low-rank Representation with three fold-ideas: a) establishing a lock-free scheduler to simultaneously respond to scheduling requests from multiple threads; b) introducing a greedy algorithm-based load balancing strategy for balancing the computational load among threads; c) incorporating Nesterov's accelerated gradient into the learning scheme to accelerate model convergence. Empirical studies show that A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.

High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent

TL;DR

This work tackles the optimization of low-rank representations for high-dimensional sparse data by introducing Accelerated Asynchronous Parallel SGD (A2PSGD). The method combines a lock-free asynchronous scheduler, greedy load balancing, and Nesterov accelerated gradient to achieve faster convergence and higher accuracy on large-scale LR tasks. Empirical results on Movielens 1M and Epinion 665K show that A2PSGD outperforms Hogwild!, DSGD, ASGD, and FPSGD in both prediction quality (lower RMSE/MAE) and training time. The approach offers practical advantages for scalable LR learning in systems with massive sparse interaction matrices, with potential extension to time-varying data and higher-order tensor representations.

Abstract

Data characterized by high dimensionality and sparsity are commonly used to describe real-world node interactions. Low-rank representation (LR) can map high-dimensional sparse (HDS) data to low-dimensional feature spaces and infer node interactions via modeling data latent associations. Unfortunately, existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets. To address this issue, this paper proposes an Accelerated Asynchronous Parallel Stochastic Gradient Descent A2PSGD for High-Dimensional Sparse Data Low-rank Representation with three fold-ideas: a) establishing a lock-free scheduler to simultaneously respond to scheduling requests from multiple threads; b) introducing a greedy algorithm-based load balancing strategy for balancing the computational load among threads; c) incorporating Nesterov's accelerated gradient into the learning scheme to accelerate model convergence. Empirical studies show that A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.
Paper Structure (19 sections, 6 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: An illustration of the FPSGD scheduler with 3 threads. This matrix is divided into $4\times4$ sub-blocks. Threads 1 and 3 launch scheduling requests to this scheduler simultaneously, but the scheduler can only handle a request at a time due to the global lock of this scheduler.
  • Figure 2: An illustration of the A$^{2}$PSGD scheduler with 3 threads. This matrix is divided into $4\times4$ sub-blocks. Threads 1 and 3 launch requests to the scheduler simultaneously, benefiting from A$^{2}$PSGD's lock-free scheduler, the scheduler can handle scheduling requests from multiple threads at a time.
  • Figure 3: RMSE convergence curves for all models at 32 threads. All subfigures share the same legend.
  • Figure 4: MAE convergence curves for all models at 32 threads. All subfigures share the same legend.