A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

Mohannad Alkhraijah; Daniel K. Molzahn

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

Mohannad Alkhraijah, Daniel K. Molzahn

TL;DR

This work addresses the challenge of terminating distributed optimization algorithms without a central coordinator by introducing a fully distributed termination framework that relies only on local computations and neighbor communications. The method uses a termination vector $V_i \\in \\{0,1rac{| cal{A}|} ight ight}$ and a termination time $T_i$, with simple rules that guarantee termination once the global criterion $B^t_G$ is satisfied, bounded by the network diameter $D$. A fault-tolerant extension adds per-agent timing $U_i$ and a correction mechanism $C_i$ to detect and neutralize faulty termination statuses, with proofs (P6–P8) showing faults cannot cause premature termination and are cleared within a finite bound. The approach is validated on a DC-OPF problem solved via ADMM on a 240-bus network, demonstrating termination after $D$ plus additional resilience iterations, and the fault-tolerant scheme maintains correct termination under fault injections. Overall, the method enables scalable, reliable distributed termination for optimization tasks in power systems and other networked domains, without centralized control or topology restrictions.

Abstract

This paper proposes a fully distributed termination method for distributed optimization algorithms solved by multiple agents. The proposed method guarantees terminating a distributed optimization algorithm after satisfying the global termination criterion using information from local computations and neighboring agents. The proposed method requires additional iterations after satisfying the global terminating criterion to communicate the termination status. The number of additional iterations is bounded by the diameter of the communication network. This paper also proposes a fault-tolerant extension of this termination method that prevents early termination due to faulty agents or communication errors. We provide a proof of the method's correctness and demonstrate the proposed method by solving the optimal power flow problem for electric power grids using the alternating direction method of multipliers.

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

TL;DR

and a termination time

, with simple rules that guarantee termination once the global criterion

is satisfied, bounded by the network diameter

. A fault-tolerant extension adds per-agent timing

and a correction mechanism

to detect and neutralize faulty termination statuses, with proofs (P6–P8) showing faults cannot cause premature termination and are cleared within a finite bound. The approach is validated on a DC-OPF problem solved via ADMM on a 240-bus network, demonstrating termination after

plus additional resilience iterations, and the fault-tolerant scheme maintains correct termination under fault injections. Overall, the method enables scalable, reliable distributed termination for optimization tasks in power systems and other networked domains, without centralized control or topology restrictions.

Abstract

Paper Structure (20 sections, 8 theorems, 4 figures, 1 table)

This paper contains 20 sections, 8 theorems, 4 figures, 1 table.

Introduction
Related Work
Contributions
Organization
Distributed Termination Problem
Notation
Problem Statement and Definitions
Distributed Termination Method
Method Procedure
Method Correctness
Fault-Tolerant Distributed Termination Method
Fault-Tolerant Method Procedure
Fault-Tolerant Method Correctness
Simulation Results
Remarks and Extensions
...and 5 more sections

Key Result

Proposition 1

For any agents $i$ and $j \in \mathcal{A}$, the $j$-th entry of the termination vector of agent $i$ is one (i.e., $V^{t}_i[j] = 1$) if and only if agent $j$ satisfies its local termination criterion (i.e., $B^{t}_j= 1$).

Figures (4)

Figure 1: An example for a faulty status where the bound in Proposition P6 is tight. The number of agents $|\mathcal{A}| = 7$ and the communication network diameter $D = 3$. The fault originates from agent $j$ (in red) at iteration $t_j$ and the faulted agent is $k$. At iteration $t_j+3$, all agents receive the faulty status (in red). Agent $i_5$ is the first agent that clears the faulty status since agent $i_5$ is a neighbor to the faulted agent $k$. Agents clear the faulty status at iteration $t^{\prime}_i$ (in green). At iteration $t_j + 9$, agent $i_1$ is the last agent to clear the faulty status. Thus, the number of iterations agent $i_1$ needs to clear the faulty status is $D + |\mathcal{A}| - 2 = 8$ iterations.
Figure 2: Communication network of the 240-bus power system with 22 agents. The nodes are the agents and the edges are the communication links.
Figure 3: Termination status for the last nine iterations of the 240-bus optimal power flow test case solved using the alternating direction method of multipliers and the proposed termination method. At iteration $t=861$, all agents satisfy their local termination criterion (green nodes) expect agent 8 (cyan node). Agent 8 satisfies its local termination criterion at iteration $t=862$ and acknowledges the global termination criterion (becomes yellow). During iterations $t=863~\text{--}~868$, the global termination criterion then traverses the other agents (they become yellow). In iterations $t=867~\text{--}~868$, all agents have received the global termination criterion (all nodes are yellow) but the agents do not terminate the computation since they are not yet assured that the global termination status information has reached all agents. This occurs at iteration $t=869$, at which point all agents terminate the computation appropriately (red nodes).
Figure 4: Number of agents that satisfy the local (green) and global (yellow) termination criteria and terminate computation (red) when five faulty agents inject faulty statuses for the first 20 of every 100 iterations, stopping at iteration 820. The algorithm then appropriately terminates at iteration 897.

Theorems & Definitions (21)

Definition 1: D1
Definition 2: D2
Definition 3: D3
Proposition 1: P1
proof
Proposition 2: P2
proof
Proposition 3: P3
proof
Definition 4: D4
...and 11 more

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

TL;DR

Abstract

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (21)