Table of Contents
Fetching ...

Parallelization Strategies for the Randomized Kaczmarz Algorithm on Large-Scale Dense Systems

Inês Ferreira, Juan A. Acebrón, José Monteiro

TL;DR

This work addresses solving large dense overdetermined linear systems with Kaczmarz-type iterative methods. It analyzes parallelization on shared and distributed memory for both the original and randomized variants, finding that the Randomized Kaczmarz with Averaging (RKA) is not efficiently parallelizable due to synchronization overhead. To overcome this, it introduces Randomized Kaczmarz with Averaging with Blocks (RKAB), which processes blocks of rows and reduces communication, and shows RKAB can match or exceed RKA performance when unit weights are used, while also reducing the convergence horizon for inconsistent systems. The results provide practical guidance on algorithm choice and parameter tuning for dense systems, highlighting that RKAB offers a robust alternative in scenarios where the goal is horizon reduction or regularization rather than exact fastest runtime.

Abstract

The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per iteration. This characteristic makes it especially useful in solving very large systems. The recent introduction of a randomized version, the Randomized Kaczmarz method, renewed interest in the algorithm, leading to the development of numerous variations. Subsequently, parallel implementations for both the original and Randomized Kaczmarz method have since then been proposed. However, previous work has addressed sparse linear systems, whereas we focus on solving dense systems. In this paper, we explore in detail approaches to parallelizing the Kaczmarz method for both shared and distributed memory for large dense systems. In particular, we implemented the Randomized Kaczmarz with Averaging (RKA) method that, for inconsistent systems, unlike the standard Randomized Kaczmarz algorithm, reduces the final error of the solution. While efficient parallelization of this algorithm is not achievable, we introduce a block version of the averaging method that can outperform the RKA method.

Parallelization Strategies for the Randomized Kaczmarz Algorithm on Large-Scale Dense Systems

TL;DR

This work addresses solving large dense overdetermined linear systems with Kaczmarz-type iterative methods. It analyzes parallelization on shared and distributed memory for both the original and randomized variants, finding that the Randomized Kaczmarz with Averaging (RKA) is not efficiently parallelizable due to synchronization overhead. To overcome this, it introduces Randomized Kaczmarz with Averaging with Blocks (RKAB), which processes blocks of rows and reduces communication, and shows RKAB can match or exceed RKA performance when unit weights are used, while also reducing the convergence horizon for inconsistent systems. The results provide practical guidance on algorithm choice and parameter tuning for dense systems, highlighting that RKAB offers a robust alternative in scenarios where the goal is horizon reduction or regularization rather than exact fastest runtime.

Abstract

The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per iteration. This characteristic makes it especially useful in solving very large systems. The recent introduction of a randomized version, the Randomized Kaczmarz method, renewed interest in the algorithm, leading to the development of numerous variations. Subsequently, parallel implementations for both the original and Randomized Kaczmarz method have since then been proposed. However, previous work has addressed sparse linear systems, whereas we focus on solving dense systems. In this paper, we explore in detail approaches to parallelizing the Kaczmarz method for both shared and distributed memory for large dense systems. In particular, we implemented the Randomized Kaczmarz with Averaging (RKA) method that, for inconsistent systems, unlike the standard Randomized Kaczmarz algorithm, reduces the final error of the solution. While efficient parallelization of this algorithm is not achievable, we introduce a block version of the averaging method that can outperform the RKA method.
Paper Structure (20 sections, 9 equations, 14 figures, 2 tables, 4 algorithms)

This paper contains 20 sections, 9 equations, 14 figures, 2 tables, 4 algorithms.

Figures (14)

  • Figure 1: Convergence of the Kaczmarz method for a consistent system in 2 dimensions using two different row selection criteria.
  • Figure 2: Speedups for the parallel implementation of RK using a block-sequential approach using a fixed number of columns.
  • Figure 3: Averaging of the solution estimate using a matrix. Example for 3 threads and $n=9$.
  • Figure 4: Results for RKA using 2, 4, 8, 16, and 64 threads and row weights $\alpha = 1$ for several overdetermined systems with $n = 4000$ with a varying number of rows.
  • Figure 5: Results for RKA using 2, 4, 8, 16, and 64 threads and row weights $\alpha = \alpha^*$ for several overdetermined systems with $n = 4000$ with a varying number of rows.
  • ...and 9 more figures