Table of Contents
Fetching ...

Circumventing the CAP Theorem with Open Atomic Ethernet

Paul Borrill

TL;DR

This paper argues that Open Atomic Ethernet (OAE) shifts the engineering regime in which CAP tradeoffs become application-visible by replacing fire-and-forget link semantics with bounded-time bilateral reconciliation of endpoint state -- the property the authors call bisynchrony.

Abstract

The CAP theorem is routinely treated as a systems law: under network partition, a replicated service must sacrifice either consistency or availability. The theorem is correct within its standard asynchronous network model, but operational practice depends on where partition-like phenomena become observable and on how lower layers discard or preserve semantic information about message fate. This paper argues that Open Atomic Ethernet (OAE) shifts the engineering regime in which CAP tradeoffs become application-visible by (i) replacing fire-and-forget link semantics with bounded-time bilateral reconciliation of endpoint state -- the property we call bisynchrony -- and (ii) avoiding Clos funnel points via an octavalent mesh in which each node can act as the root of a locally repaired spanning tree. The result is not the elimination of hard graph cuts, but a drastic reduction in the frequency and duration of application-visible "soft partitions" by detecting and healing dominant fabric faults within hundreds of nanoseconds. We connect this view to Brewer's original CAP framing, the formalization by Gilbert and Lynch, the CAL theorem of Lee et al., which replaces binary partition tolerance with a quantitative measure of apparent latency, and Abadi's PACELC extension.

Circumventing the CAP Theorem with Open Atomic Ethernet

TL;DR

This paper argues that Open Atomic Ethernet (OAE) shifts the engineering regime in which CAP tradeoffs become application-visible by replacing fire-and-forget link semantics with bounded-time bilateral reconciliation of endpoint state -- the property the authors call bisynchrony.

Abstract

The CAP theorem is routinely treated as a systems law: under network partition, a replicated service must sacrifice either consistency or availability. The theorem is correct within its standard asynchronous network model, but operational practice depends on where partition-like phenomena become observable and on how lower layers discard or preserve semantic information about message fate. This paper argues that Open Atomic Ethernet (OAE) shifts the engineering regime in which CAP tradeoffs become application-visible by (i) replacing fire-and-forget link semantics with bounded-time bilateral reconciliation of endpoint state -- the property we call bisynchrony -- and (ii) avoiding Clos funnel points via an octavalent mesh in which each node can act as the root of a locally repaired spanning tree. The result is not the elimination of hard graph cuts, but a drastic reduction in the frequency and duration of application-visible "soft partitions" by detecting and healing dominant fabric faults within hundreds of nanoseconds. We connect this view to Brewer's original CAP framing, the formalization by Gilbert and Lynch, the CAL theorem of Lee et al., which replaces binary partition tolerance with a quantitative measure of apparent latency, and Abadi's PACELC extension.
Paper Structure (34 sections, 6 theorems, 19 equations, 7 figures, 1 table)

This paper contains 34 sections, 6 theorems, 19 equations, 7 figures, 1 table.

Key Result

Lemma 8.1

If $G$ is a tree, then $\tau(G)=1$.

Figures (7)

  • Figure 1: Three OAE cells ($C_1$, $C_2$, $C_3$) connected by bilateral links. Each link endpoint contains an EPI register pair: Alice (green) and Bob (blue). Cell $C_2$ is an octavalent cell with all eight ports visible. The link between two cells is not a one-way channel but a paired register reconciliation: both endpoints participate symmetrically in each slot, and the outcome---$(M,M)$ or $(\emptyset,\emptyset)$---is known to both parties at round boundary.
  • Figure 2: Back-to-back Shannon channels forming a bilateral OAE link. Each direction comprises a classical Source--Encoder--Channel--Decoder--Destination chain with SerDes and 64/66-bit encoding. The memoryless (PHI) region spans the physical channel; the Alternating Bit Protocol (ABP / Stop-and-Wait) region spans the EPI register reconciliation at each endpoint. The two channels share no state---each is an independent noisy channel in the Shannon sense---but the bilateral register swap at the endpoints converts the pair into a single atomic reconciliation with common-knowledge outcome.
  • Figure 3: Conventional three-tier Clos fabric (logical view). Traffic between compute nodes (orange) traverses ToR, Leaf, and Spine tiers via 100G transit links. A single switch or link failure triggers global control-plane reconvergence---the window during which higher layers observe a partition.
  • Figure 4: The same Clos fabric (physical wiring). Individual server NICs are cabled point-to-point to Top-of-Rack switches via SFP+/QSFP transceivers over copper or fiber. Uplinks (blue cables) fan out to leaf and spine tiers above. Every link---NIC to ToR, ToR to leaf, leaf to spine---is a unilateral fire-and-forget channel: the sender transmits a frame and receives no link-layer acknowledgment of delivery.
  • Figure 5: OAE Cellular Fabrix: a $10\times 20$ octavalent mesh with valency color-coding. Yellow cells (corners) have valency 3; orange cells (edges) have valency 5; dark red cells (core) have valency 8. Each functional link operates at 100G. Every cell is both compute and forwarding---there are no proprietary switch tiers. A single link failure is healed locally by parent reselection, not by global control-plane reconvergence. Note that this is a scale-independent architecture: the sea of nodes/XPUs can extend in the planar direction (north, east, west, south) via connections to the orange cells, eliminating the need for hierarchical switches.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Lemma 8.1: A tree has exactly one spanning tree
  • proof
  • Theorem 8.2: Kirchhoff / Matrix--Tree
  • Proposition 8.3: Spanning trees of a grid
  • proof
  • Corollary 8.4: Exponentially many alternate trees
  • proof
  • Definition 8.5
  • Proposition 8.6: Failure avoidance via disjoint trees
  • proof
  • ...and 3 more