Parallelization Strategies for the Randomized Kaczmarz Algorithm on Large-Scale Dense Systems
Inês Ferreira, Juan A. Acebrón, José Monteiro
TL;DR
This work addresses solving large dense overdetermined linear systems with Kaczmarz-type iterative methods. It analyzes parallelization on shared and distributed memory for both the original and randomized variants, finding that the Randomized Kaczmarz with Averaging (RKA) is not efficiently parallelizable due to synchronization overhead. To overcome this, it introduces Randomized Kaczmarz with Averaging with Blocks (RKAB), which processes blocks of rows and reduces communication, and shows RKAB can match or exceed RKA performance when unit weights are used, while also reducing the convergence horizon for inconsistent systems. The results provide practical guidance on algorithm choice and parameter tuning for dense systems, highlighting that RKAB offers a robust alternative in scenarios where the goal is horizon reduction or regularization rather than exact fastest runtime.
Abstract
The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per iteration. This characteristic makes it especially useful in solving very large systems. The recent introduction of a randomized version, the Randomized Kaczmarz method, renewed interest in the algorithm, leading to the development of numerous variations. Subsequently, parallel implementations for both the original and Randomized Kaczmarz method have since then been proposed. However, previous work has addressed sparse linear systems, whereas we focus on solving dense systems. In this paper, we explore in detail approaches to parallelizing the Kaczmarz method for both shared and distributed memory for large dense systems. In particular, we implemented the Randomized Kaczmarz with Averaging (RKA) method that, for inconsistent systems, unlike the standard Randomized Kaczmarz algorithm, reduces the final error of the solution. While efficient parallelization of this algorithm is not achievable, we introduce a block version of the averaging method that can outperform the RKA method.
