Table of Contents
Fetching ...

Distributed Download from an External Data Source in Faulty Majority Settings

John Augustine, Soumyottam Chatterjee, Valerie King, Manish Kumar, Shachar Meir, David Peleg

TL;DR

This work analyzes Download in the Data Retrieval (DR) model, where k peers in a clique query an external data source storing n bits X, with up to βk Byzantine faults and a nonfaulty fraction γ=1−β. It develops randomized and deterministic schemes across synchronous, asynchronous, broadcast, and crash-fault settings to minimize the per-peer query load while ensuring correct reconstruction of X; the results include a query-optimal randomized protocol for arbitrary β with Q=̃O(n/(γ k)) and time near linear, as well as a lower bound showing that single-round solutions must effectively query all bits. The authors further achieve faster time-optimized variants, including a two-round algorithm and a multi-round, log-time approach using partitioned intervals, frequent-string learning, and decision-tree reconstruction, with near-optimal query costs in the presence of dynamic adversaries. In crash-fault models, they provide deterministic protocols that are query-optimal in synchronous settings and retain near-optimal performance in asynchronous settings, detailing time and message-size tradeoffs. Collectively, the results advance fault-tolerant data retrieval from external sources in distributed networks, with implications for blockchain oracles and decentralized data feeds where minimizing per-peer read costs is crucial.

Abstract

We extend the study of retrieval problems in distributed networks, focusing on improving the efficiency and resilience of protocols in the \emph{Data Retrieval (DR) Model}. The DR Model consists of a complete network (i.e., a clique) with $k$ peers, up to $βk$ of which may be Byzantine (for $β\in [0, 1)$), and a trusted \emph{External Data Source} comprising an array $X$ of $n$ bits ($n \gg k$) that the peers can query. Additionally, the peers can also send messages to each other. In this work, we focus on the Download problem that requires all peers to learn $X$. Our primary goal is to minimize the maximum number of queries made by any honest peer and additionally optimize time. We begin with a randomized algorithm for the Download problem that achieves optimal query complexity up to a logarithmic factor. For the stronger dynamic adversary that can change the set of Byzantine peers from one round to the next, we achieve the optimal time complexity in peer-to-peer communication but with larger messages. In broadcast communication where all peers (including Byzantine peers) are required to send the same message to all peers, with larger messages, we achieve almost optimal time and query complexities for a dynamic adversary. Finally, in a more relaxed crash fault model, where peers stop responding after crashing, we address the Download problem in both synchronous and asynchronous settings. Using a deterministic protocol, we obtain nearly optimal results for both query complexity and message sizes in these scenarios.

Distributed Download from an External Data Source in Faulty Majority Settings

TL;DR

This work analyzes Download in the Data Retrieval (DR) model, where k peers in a clique query an external data source storing n bits X, with up to βk Byzantine faults and a nonfaulty fraction γ=1−β. It develops randomized and deterministic schemes across synchronous, asynchronous, broadcast, and crash-fault settings to minimize the per-peer query load while ensuring correct reconstruction of X; the results include a query-optimal randomized protocol for arbitrary β with Q=̃O(n/(γ k)) and time near linear, as well as a lower bound showing that single-round solutions must effectively query all bits. The authors further achieve faster time-optimized variants, including a two-round algorithm and a multi-round, log-time approach using partitioned intervals, frequent-string learning, and decision-tree reconstruction, with near-optimal query costs in the presence of dynamic adversaries. In crash-fault models, they provide deterministic protocols that are query-optimal in synchronous settings and retain near-optimal performance in asynchronous settings, detailing time and message-size tradeoffs. Collectively, the results advance fault-tolerant data retrieval from external sources in distributed networks, with implications for blockchain oracles and decentralized data feeds where minimizing per-peer read costs is crucial.

Abstract

We extend the study of retrieval problems in distributed networks, focusing on improving the efficiency and resilience of protocols in the \emph{Data Retrieval (DR) Model}. The DR Model consists of a complete network (i.e., a clique) with peers, up to of which may be Byzantine (for ), and a trusted \emph{External Data Source} comprising an array of bits () that the peers can query. Additionally, the peers can also send messages to each other. In this work, we focus on the Download problem that requires all peers to learn . Our primary goal is to minimize the maximum number of queries made by any honest peer and additionally optimize time. We begin with a randomized algorithm for the Download problem that achieves optimal query complexity up to a logarithmic factor. For the stronger dynamic adversary that can change the set of Byzantine peers from one round to the next, we achieve the optimal time complexity in peer-to-peer communication but with larger messages. In broadcast communication where all peers (including Byzantine peers) are required to send the same message to all peers, with larger messages, we achieve almost optimal time and query complexities for a dynamic adversary. Finally, in a more relaxed crash fault model, where peers stop responding after crashing, we address the Download problem in both synchronous and asynchronous settings. Using a deterministic protocol, we obtain nearly optimal results for both query complexity and message sizes in these scenarios.
Paper Structure (20 sections, 9 theorems, 17 equations, 2 tables, 3 algorithms)

This paper contains 20 sections, 9 theorems, 17 equations, 2 tables, 3 algorithms.

Key Result

Lemma 2.1

For$j\in[f,\lg(\gamma k)-\lg\lg n]$,$P_j\in\left(\frac{2^j}{\gamma k}\left(1-\frac{1}{2\lg n}\right),\frac{2^j}{\gamma k}\right)$.

Theorems & Definitions (21)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Corollary 2.3
  • Lemma 2.4
  • proof
  • Theorem 2.5
  • Definition 3.1
  • Definition 3.2
  • ...and 11 more