Asynchronous Latency and Fast Atomic Snapshot
João Paulo Bezerra, Luciano Freitas, Petr Kuznetsov, Matthieu Rambaud
TL;DR
The paper tackles the challenge of measuring latency for asynchronous, long‑lived distributed abstractions (LA and ASO) by exposing deficiencies in existing time metrics and proposing a unifying, operation‑level latency framework. It develops a fast, fault‑tolerant LA protocol built on buffering, helping, and full data relaying, and shows how to implement a SW MR ASO atop LA with a principled join‑semilattice construction. The resulting LA/ASO scheme achieves optimal two‑round latency in fault‑free, no‑contention runs, eight rounds under contention, and amortized constant latency in long executions despite $O(n^2)$ total message complexity; the worst‑case latency scales as $O(k)$ where $k$ is the number of active faulty processes. By introducing Iterative Round Assignment and a generalized CR/NTR framework, the paper enables fair, rigorous comparisons with prior work and clarifies how holes in executions affect latency assessments, providing a solid foundation for evaluating long‑lived asynchronous protocols.
Abstract
This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what ``fast'' means exactly, we spot a few interesting issues that arise when conventional time metrics are applied to long-lived asynchronous algorithms. We reveal some gaps in latency claims made in earlier work on snapshot algorithms, which hamper their comparative time-complexity analysis. We then come up with a new unifying time-complexity metric that captures the latency of an operation in an asynchronous, long-lived implementation. This allows us to formally grasp latency improvements of our atomic-snapshot algorithm with respect to the state-of-the-art protocols: optimal latency in fault-free runs without contention, short constant latency in fault-free runs with contention, the worst-case latency proportional to the number of active concurrent failures, and constant, amortized latency.
