Table of Contents
Fetching ...

Approximate Byzantine Fault-Tolerance in Distributed Optimization

Shuo Liu, Nirupam Gupta, Nitin H. Vaidya

TL;DR

This work tackles Byzantine fault-tolerance in distributed optimization by introducing $(f,\epsilon)$-resilience, a relaxation of exact fault-tolerance that seeks an approximate minimum of the non-faulty agents’ aggregate cost. It establishes a formal link between resilience and a weaker redundancy notion, $(2f,\epsilon)$-redundancy, proving both necessary and sufficient conditions for feasibility and achievability. The authors develop a distributed gradient-descent framework with robust gradient-aggregation (gradient-filters) and provide generic convergence guarantees, plus concrete results for two widely used filters, CGE and CWTM, under standard smoothness and convexity assumptions. Numerical experiments on distributed linear regression corroborate the theory, showing that the proposed approach achieves outputs within $\epsilon$ of the non-faulty optimum despite up to $f$ Byzantine faulty agents. The results offer a practical, provable pathway to resilient distributed optimization in noisy, adversarial environments, with implications for distributed sensing, learning, and state estimation.

Abstract

This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to $f$ (out of $n$) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called $2f$-redundancy. However, $2f$-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of $(f,ε)$-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with $ε$ accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to $2f$-redundancy. We obtain necessary and sufficient conditions for achieving $(f,ε)$-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for $(f,ε)$-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation.

Approximate Byzantine Fault-Tolerance in Distributed Optimization

TL;DR

This work tackles Byzantine fault-tolerance in distributed optimization by introducing -resilience, a relaxation of exact fault-tolerance that seeks an approximate minimum of the non-faulty agents’ aggregate cost. It establishes a formal link between resilience and a weaker redundancy notion, -redundancy, proving both necessary and sufficient conditions for feasibility and achievability. The authors develop a distributed gradient-descent framework with robust gradient-aggregation (gradient-filters) and provide generic convergence guarantees, plus concrete results for two widely used filters, CGE and CWTM, under standard smoothness and convexity assumptions. Numerical experiments on distributed linear regression corroborate the theory, showing that the proposed approach achieves outputs within of the non-faulty optimum despite up to Byzantine faulty agents. The results offer a practical, provable pathway to resilient distributed optimization in noisy, adversarial environments, with implications for distributed sensing, learning, and state estimation.

Abstract

This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to (out of ) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called -redundancy. However, -redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of -resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to -redundancy. We obtain necessary and sufficient conditions for achieving -resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for -resilience of the distributed gradient-descent method when equipped with robust gradient aggregation.

Paper Structure

This paper contains 33 sections, 1 theorem, 212 equations, 5 figures, 1 table.

Key Result

Lemma 1

If $f \geq n/2$ then there cannot exist a deterministic $(f, \, \epsilon)$-resilient algorithm for any $\epsilon \geq 0$.

Figures (5)

  • Figure 1: System architecture.
  • Figure 2: The losses, i.e., $\sum_{i \in \mathcal{H}}Q_i(x^t)$, and distances, i.e., $\left\lVert x^t - x_{\mathcal{H}}\right\rVert$, versus the number of iterations in the algorithm. The final approximation errors, i.e., $\left\lVert x^{5000} - x_{\mathcal{H}}\right\rVert$, are annotated in the same colors as their corresponding plots. For the executions shown, agent $1$ is assumed to be Byzantine faulty. The different columns show the results when the faulty agent exhibits the different types of faults: (a) gradient-reverse, and (b) random. Apart from the plots with CGE (in green) and CWTM (in yellow) gradient-filters, we also plot the fault-free DGD method where the faulty agent is omitted (in blue), and the DGD method without any gradient-filters when agent $1$ is Byzantine faulty (in red), both using averaging for aggregation.
  • Figure 3: The losses, i.e., $\sum_{i \in \mathcal{H}}Q_i(x^t)$, and distances, i.e., $\left\lVert x^t - x_{\mathcal{H}}\right\rVert$, versus the number of iterations in the algorithm, magnified for the initial 80 iterations in the training process. The interpretation of the plots is same as that in Figure \ref{['fig:fault-comparison']}.
  • Figure 4: The cross-entropy loss and model accuracy, versus the number of iterations in the algorithm, using our algorithm with D-SGD on MNIST with $n=10$ and $f=3$. The two experiments using CWTM are in solid plots, and the two using CGE are in dashed plots. The two experiments against LF faults are plotted in yellow, while the two against RG are plotted in green. We also show the performance of fault-free D-SGD method where the faulty agent is omitted in blue solid plots.
  • Figure 5: The cross-entropy loss and model accuracy, versus the number of iterations in the algorithm, using our algorithm with D-SGD on Fashion-MNIST with $n=10$ and $f=3$. The two experiments using CWTM are in solid plots, and the two using CGE are in dashed plots. The two experiments against LF faults are plotted in yellow, while the two against RG are plotted in green. We also show the performance of fault-free D-SGD method where the faulty agent is omitted in blue solid plots.

Theorems & Definitions (6)

  • proof
  • proof
  • Lemma : Restated
  • proof
  • proof
  • proof