Table of Contents
Fetching ...

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

Mohannad Alkhraijah, Daniel K. Molzahn

TL;DR

This work addresses the challenge of terminating distributed optimization algorithms without a central coordinator by introducing a fully distributed termination framework that relies only on local computations and neighbor communications. The method uses a termination vector $V_i \\in \\{0,1 rac{| cal{A}|} ight ight}$ and a termination time $T_i$, with simple rules that guarantee termination once the global criterion $B^t_G$ is satisfied, bounded by the network diameter $D$. A fault-tolerant extension adds per-agent timing $U_i$ and a correction mechanism $C_i$ to detect and neutralize faulty termination statuses, with proofs (P6–P8) showing faults cannot cause premature termination and are cleared within a finite bound. The approach is validated on a DC-OPF problem solved via ADMM on a 240-bus network, demonstrating termination after $D$ plus additional resilience iterations, and the fault-tolerant scheme maintains correct termination under fault injections. Overall, the method enables scalable, reliable distributed termination for optimization tasks in power systems and other networked domains, without centralized control or topology restrictions.

Abstract

This paper proposes a fully distributed termination method for distributed optimization algorithms solved by multiple agents. The proposed method guarantees terminating a distributed optimization algorithm after satisfying the global termination criterion using information from local computations and neighboring agents. The proposed method requires additional iterations after satisfying the global terminating criterion to communicate the termination status. The number of additional iterations is bounded by the diameter of the communication network. This paper also proposes a fault-tolerant extension of this termination method that prevents early termination due to faulty agents or communication errors. We provide a proof of the method's correctness and demonstrate the proposed method by solving the optimal power flow problem for electric power grids using the alternating direction method of multipliers.

A Fault-Tolerant Distributed Termination Method for Distributed Optimization Algorithms

TL;DR

This work addresses the challenge of terminating distributed optimization algorithms without a central coordinator by introducing a fully distributed termination framework that relies only on local computations and neighbor communications. The method uses a termination vector and a termination time , with simple rules that guarantee termination once the global criterion is satisfied, bounded by the network diameter . A fault-tolerant extension adds per-agent timing and a correction mechanism to detect and neutralize faulty termination statuses, with proofs (P6–P8) showing faults cannot cause premature termination and are cleared within a finite bound. The approach is validated on a DC-OPF problem solved via ADMM on a 240-bus network, demonstrating termination after plus additional resilience iterations, and the fault-tolerant scheme maintains correct termination under fault injections. Overall, the method enables scalable, reliable distributed termination for optimization tasks in power systems and other networked domains, without centralized control or topology restrictions.

Abstract

This paper proposes a fully distributed termination method for distributed optimization algorithms solved by multiple agents. The proposed method guarantees terminating a distributed optimization algorithm after satisfying the global termination criterion using information from local computations and neighboring agents. The proposed method requires additional iterations after satisfying the global terminating criterion to communicate the termination status. The number of additional iterations is bounded by the diameter of the communication network. This paper also proposes a fault-tolerant extension of this termination method that prevents early termination due to faulty agents or communication errors. We provide a proof of the method's correctness and demonstrate the proposed method by solving the optimal power flow problem for electric power grids using the alternating direction method of multipliers.
Paper Structure (20 sections, 8 theorems, 4 figures, 1 table)

This paper contains 20 sections, 8 theorems, 4 figures, 1 table.

Key Result

Proposition 1

For any agents $i$ and $j \in \mathcal{A}$, the $j$-th entry of the termination vector of agent $i$ is one (i.e., $V^{t}_i[j] = 1$) if and only if agent $j$ satisfies its local termination criterion (i.e., $B^{t}_j= 1$).

Figures (4)

  • Figure 1: An example for a faulty status where the bound in Proposition P6 is tight. The number of agents $|\mathcal{A}| = 7$ and the communication network diameter $D = 3$. The fault originates from agent $j$ (in red) at iteration $t_j$ and the faulted agent is $k$. At iteration $t_j+3$, all agents receive the faulty status (in red). Agent $i_5$ is the first agent that clears the faulty status since agent $i_5$ is a neighbor to the faulted agent $k$. Agents clear the faulty status at iteration $t^{\prime}_i$ (in green). At iteration $t_j + 9$, agent $i_1$ is the last agent to clear the faulty status. Thus, the number of iterations agent $i_1$ needs to clear the faulty status is $D + |\mathcal{A}| - 2 = 8$ iterations.
  • Figure 2: Communication network of the 240-bus power system with 22 agents. The nodes are the agents and the edges are the communication links.
  • Figure 3: Termination status for the last nine iterations of the 240-bus optimal power flow test case solved using the alternating direction method of multipliers and the proposed termination method. At iteration $t=861$, all agents satisfy their local termination criterion (green nodes) expect agent 8 (cyan node). Agent 8 satisfies its local termination criterion at iteration $t=862$ and acknowledges the global termination criterion (becomes yellow). During iterations $t=863~\text{--}~868$, the global termination criterion then traverses the other agents (they become yellow). In iterations $t=867~\text{--}~868$, all agents have received the global termination criterion (all nodes are yellow) but the agents do not terminate the computation since they are not yet assured that the global termination status information has reached all agents. This occurs at iteration $t=869$, at which point all agents terminate the computation appropriately (red nodes).
  • Figure 4: Number of agents that satisfy the local (green) and global (yellow) termination criteria and terminate computation (red) when five faulty agents inject faulty statuses for the first 20 of every 100 iterations, stopping at iteration 820. The algorithm then appropriately terminates at iteration 897.

Theorems & Definitions (21)

  • Definition 1: D1
  • Definition 2: D2
  • Definition 3: D3
  • Proposition 1: P1
  • proof
  • Proposition 2: P2
  • proof
  • Proposition 3: P3
  • proof
  • Definition 4: D4
  • ...and 11 more