Distributed convergence detection based on global residual error under asynchronous iterations

Frédéric Magoulès; Guillaume Gbikpi-Benissan

Distributed convergence detection based on global residual error under asynchronous iterations

Frédéric Magoulès, Guillaume Gbikpi-Benissan

TL;DR

Addresses the challenge of detecting convergence in asynchronous iterations by computing a global residual $\|f(\bar{x}) - \bar{x}\|$ with a single reduction, avoiding blocking communications. The authors introduce Chandy–Lamport snapshot-based AIS protocols, extend them to arbitrary non-FIFO contexts with several variants (AIS1, AIS2, AIS3, AIS4, AIS5), and provide a formal bound connecting approximate residuals to the true residual. They prove consistency and error-bounds for non-FIFO settings and show that a single reduction suffices for convergence detection under many practical communication models. Large-scale experiments on up to 5600 cores demonstrate that these methods achieve effective convergence detection with competitive or improved execution times compared to established two-reduction approaches. This work significantly reduces termination delay and burden of convergence checks in asynchronous parallel computations.

Abstract

Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.

Distributed convergence detection based on global residual error under asynchronous iterations

TL;DR

Addresses the challenge of detecting convergence in asynchronous iterations by computing a global residual

with a single reduction, avoiding blocking communications. The authors introduce Chandy–Lamport snapshot-based AIS protocols, extend them to arbitrary non-FIFO contexts with several variants (AIS1, AIS2, AIS3, AIS4, AIS5), and provide a formal bound connecting approximate residuals to the true residual. They prove consistency and error-bounds for non-FIFO settings and show that a single reduction suffices for convergence detection under many practical communication models. Large-scale experiments on up to 5600 cores demonstrate that these methods achieve effective convergence detection with competitive or improved execution times compared to established two-reduction approaches. This work significantly reduces termination delay and burden of convergence checks in asynchronous parallel computations.

Abstract

Paper Structure (17 sections, 6 theorems, 57 equations, 4 figures, 3 tables, 8 algorithms)

This paper contains 17 sections, 6 theorems, 57 equations, 4 figures, 3 tables, 8 algorithms.

Introduction
Related works
Problem formulation
Asynchronous iterations
Convergence detection
Determining a global solution vector
The Chandy--Lamport snapshot (CLS)
New asynchronous iterations snapshots (AIS)
New non-FIFO asynchronous iterations snapshots
Arbitrary non-FIFO communication
Inter-protocol non-FIFO communication
Non-FIFO communication with bounded number of cross messages
Numerical results
Problem and experimental settings
Effectiveness
...and 2 more sections

Key Result

Theorem 1

Let $\mathcal{S(C)} = \{s^{t}\}_{t \in \mathbb{N}}$ denote the global states sequence generated by a computation $\mathcal{C}$. Let $\bar{s}$ be the global state recorded by an execution of the CLS protocol on $\mathcal{C}$. Then there exists an equivalent permutation $\mathcal{P(C)}$ of $\mathcal{C

Figures (4)

Figure 1: Example of a CLS protocol execution with two processes.
Figure 2: Non-FIFO snapshot issues.
Figure 3: Examples of issues handled by non-FIFO AIS protocol 4.
Figure 4: Domain discretization and partitioning (16 sub-domains).

Theorems & Definitions (12)

Theorem 1: Chandy & Lamport, 1985
proof
Proposition 1
proof
Proposition 2
proof
Proposition 3
proof
Proposition 4
proof
...and 2 more

Distributed convergence detection based on global residual error under asynchronous iterations

TL;DR

Abstract

Distributed convergence detection based on global residual error under asynchronous iterations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)