Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?

Mohammad T. Hajiaghayi; Dariusz R. Kowalski; Jan Olkowski

Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?

Mohammad T. Hajiaghayi, Dariusz R. Kowalski, Jan Olkowski

TL;DR

The paper tackles synchronous consensus with adaptive omission faults and an unbounded adversary, introducing a near-optimal randomized algorithm that runs in $O\left(\sqrt{n}\log^{2}{n}\right)$ rounds and uses $O\left(n^{2}\log^{3}{n}\right)$ bits of communication when $t=\Theta(n)$. It introduces an operative/inoperative partition and a biased-majority voting scheme, supported by a $\sqrt{n}$-group decomposition and fast inter-group communication via a dense, shallow random graph, achieving consensus whp. A new randomness–time lower bound shows that reducing time below certain thresholds necessarily increases randomness usage, via a one-round coin-flipping game analyzed with Talagrand inequalities. The paper also provides a tight randomness-time trade-off (and a broader interpolation) showing that higher randomness can substantially reduce time, with the total communication cost remaining near optimal, and it outlines several open directions for extending these ideas to other fault models and problems.

Abstract

We study the problem of reaching agreement in a synchronous distributed system by $n$ autonomous parties, when the communication links from/to faulty parties can omit messages. The faulty parties are selected and controlled by an adaptive, full-information, computationally unbounded adversary. We design a randomized algorithm that works in $O(\sqrt{n}\log^2 n)$ rounds and sends $O(n^2\log^3 n)$ communication bits, where the number of faulty parties is $Θ(n)$. Our result is simultaneously tight for both these measures within polylogarithmic factors: due to the $Ω(n^2)$ lower bound on communication by Abraham et al. (PODC'19) and $Ω(\sqrt{n/\log n})$ lower bound on the number of rounds by Bar-Joseph and Ben-Or (PODC'98). We also quantify how much randomness is necessary and sufficient to reduce time complexity to a certain value, while keeping the communication complexity (nearly) optimal. We prove that no MC algorithm can work in less than $Ω(\frac{n^2}{\max\{R,n\}\log n})$ rounds if it uses less than $O(R)$ calls to a random source, assuming a constant fraction of faulty parties. This can be contrasted with a long line of work on consensus against an {\em adversary limited to polynomial computation time}, thus unable to break cryptographic primitives, culminating in a work by Ghinea et al. (EUROCRYPT'22), where an optimal $O(r)$-round solution with probability $1-(cr)^{-r}$ is given. Our lower bound strictly separates these two regimes, by excluding such results if the adversary is computationally unbounded. On the upper bound side, we show that for $R\in\tilde{O}(n^{3/2})$ there exists an algorithm solving consensus in $\tilde{O}(\frac{n^2}{R})$ rounds with high probability, where tilde notation hides a polylogarithmic factor. The communication complexity of the algorithm does not depend on the amount of randomness $R$ and stays optimal within polylogarithmic factor.

Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?

TL;DR

The paper tackles synchronous consensus with adaptive omission faults and an unbounded adversary, introducing a near-optimal randomized algorithm that runs in

rounds and uses

bits of communication when

. It introduces an operative/inoperative partition and a biased-majority voting scheme, supported by a

-group decomposition and fast inter-group communication via a dense, shallow random graph, achieving consensus whp. A new randomness–time lower bound shows that reducing time below certain thresholds necessarily increases randomness usage, via a one-round coin-flipping game analyzed with Talagrand inequalities. The paper also provides a tight randomness-time trade-off (and a broader interpolation) showing that higher randomness can substantially reduce time, with the total communication cost remaining near optimal, and it outlines several open directions for extending these ideas to other fault models and problems.

Abstract

We study the problem of reaching agreement in a synchronous distributed system by

autonomous parties, when the communication links from/to faulty parties can omit messages. The faulty parties are selected and controlled by an adaptive, full-information, computationally unbounded adversary. We design a randomized algorithm that works in

rounds and sends

communication bits, where the number of faulty parties is

. Our result is simultaneously tight for both these measures within polylogarithmic factors: due to the

lower bound on communication by Abraham et al. (PODC'19) and

lower bound on the number of rounds by Bar-Joseph and Ben-Or (PODC'98). We also quantify how much randomness is necessary and sufficient to reduce time complexity to a certain value, while keeping the communication complexity (nearly) optimal. We prove that no MC algorithm can work in less than

rounds if it uses less than

calls to a random source, assuming a constant fraction of faulty parties. This can be contrasted with a long line of work on consensus against an {\em adversary limited to polynomial computation time}, thus unable to break cryptographic primitives, culminating in a work by Ghinea et al. (EUROCRYPT'22), where an optimal

-round solution with probability

is given. Our lower bound strictly separates these two regimes, by excluding such results if the adversary is computationally unbounded. On the upper bound side, we show that for

there exists an algorithm solving consensus in

rounds with high probability, where tilde notation hides a polylogarithmic factor. The communication complexity of the algorithm does not depend on the amount of randomness

and stays optimal within polylogarithmic factor.

Paper Structure (23 sections, 34 theorems, 35 equations, 3 figures, 1 table, 4 algorithms)

This paper contains 23 sections, 34 theorems, 35 equations, 3 figures, 1 table, 4 algorithms.

Introduction
Summary of results and the paper structure.
Model Details and Definitions
Processes' omission failures and adversaries.
Consensus problem.
Main consensus algorithm
Universal idea: Local and dynamic partitioning of processes into operative / inoperative and implementing time- and communication-efficient biased-majority-voting only by the operative ones.
An implementation of a single biased-majority-vote subroutine (epoch).
Technical advancement 1: $\sqrt{n}$-decomposition into groups and binary-tree-like intra-group calculations of operative processes for communication saving.
Technical advancement 2: Fast inter-group communication and status maintenance between operative processes.
Putting them all together: consensus protocol based on biased-majority-voting adjusted to the new efficient voting implementation in an epoch.
Lower bound
Lower bound's technical novelty and overview of its analysis.
Interpolation between random and deterministic solutions
Future Directions
...and 8 more sections

Key Result

Theorem 1

There is a randomized algorithm solving consensus with probability $1$ against the adaptive omission adversary that can control $t < \frac{n}{30}$ processes, which terminates in $O\left(\sqrt{n}\log^{2}{n}\right)$ rounds and uses $O(n^2\log^{3}n)$ bits of communication and $O\left(n \cdot \sqrt{n}\l

Figures (3)

Figure 1: A schematic picture of two different techniques used for communication between processes. Different colors represent different groups in the $\sqrt{n}$-decomposition of the processes. The links represent the overlaying communication resembling a sparse random graph used for exchanging operative counts of different groups. The choice of links is independent of the $\sqrt{n}$-decomposition.
Figure 2: Visualization of the $\sqrt{n}$-decomposition of the blue group from Figure \ref{['fig:group-partition']}. The processes $a,b,c,d,e$ in the group are logically decomposed into a binary tree. The pink arrows visualize the three-round process of relaying operative counts of the two children of the root to the root itself. First, the counts are relayed to all processes in the group (arrow #1), then the processes send a confirmation if they received the counts (arrow #2), finally, all in the group transmit the received counts to the higher layer -- the root in this case (arrow #3). Some processes can be faulty (process $c$ does not communicate, only $\{a,b,d,e\}$) and their values are not guaranteed to be accumulated accurately.
Figure 3: A picture explaining the thresholds in a single execution of the biased-majority-voting subroutine, see lines \ref{['line:if-1']}-\ref{['line:if-random']} in Algorithm \ref{['alg:opt-omissions']}. Different colors represent different outcomes (each obtained in a different epoch) of the counting of candidate values in preceding lines \ref{['line:group-bits-aggr']} and \ref{['line:sum_ones_zeros']}.

Theorems & Definitions (56)

Theorem 1
Lemma : Lemma \ref{['lem:bits-agg-contr']} and \ref{['lem:msg-aggr']} in Appendix \ref{['subsec:analysis-main']}
Lemma : Lemmas \ref{['lem:spreading-reaching']}, \ref{['lem:operative-contribution']} and Theorem \ref{['thm:main-algo']} in Appendix \ref{['subsec:analysis-main']}
Lemma : Lemma \ref{['lem:good-proc-are-large']} in Appendix \ref{['subsec:analysis-main']}
Theorem 2
Lemma : Lemma \ref{['lem:initial-exe']} in Appendix \ref{['sec:proof-lower']}
Lemma : Lemma \ref{['lem:null-valent']} in Appendix \ref{['sec:proof-lower']}
Lemma : Lemma \ref{['lem:bi-valent']} in Appendix \ref{['sec:proof-lower']}
proof : Proof of Theorem \ref{['thm:lower-randomness-res']}
Theorem 3
...and 46 more

Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?

TL;DR

Abstract

Nearly-Optimal Consensus Tolerating Adaptive Omissions: Why is a Lot of Randomness Needed?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (56)