Table of Contents
Fetching ...

A Tree-Structured Two-Phase Commit Framework for OceanBase: Optimizing Scalability and Consistency

Quanqing Xu, Chen Qian, Chuanhui Yang, Fanyu Kong, Guixiang Liu, Fusheng Han, Zixiang Zhai

TL;DR

Experimental evaluation demonstrates performance approaching that of single-machine transactions, with reduced latency and bandwidth consumption, validating the framework's effectiveness for modern distributed databases.

Abstract

Modern distributed databases face challenges in achieving transactional consistency across distributed partitions. Traditional two-phase commit (2PC) protocols incur high coordination overhead and latency, and require complex recovery for dynamic partition transfers. This paper introduces a novel tree-shaped 2PC framework for OceanBase that leverages single-machine log streams to address these challenges through three innovations. First, we propose log streams as atomic participants, replacing partition-level coordination. By treating each log stream as the commit unit, a transaction spanning $N$ co-located partitions interacts with one participant, reducing coordination overhead by orders of magnitude (e.g., 99 percent reduction for $N=100$). Second, we design a tree-shaped 2PC protocol with coordinator-rooted DAG topology that dynamically handles partition transfers by recursively constructing commit trees. When a partition migrates during a transaction, the protocol embeds migration contexts as leaf nodes, eliminating explicit participant list updates, resolving circular dependencies, and ensuring linearizable commits under topology changes. Third, we introduce prepare-unknown and trans-unknown states to prevent consistency violations when participants lose context. These states signal uncertainty during retries, avoiding erroneous aborts from so-called lying participants while isolating users from ambiguity. Experimental evaluation demonstrates performance approaching that of single-machine transactions, with reduced latency and bandwidth consumption, validating the framework's effectiveness for modern distributed databases.

A Tree-Structured Two-Phase Commit Framework for OceanBase: Optimizing Scalability and Consistency

TL;DR

Experimental evaluation demonstrates performance approaching that of single-machine transactions, with reduced latency and bandwidth consumption, validating the framework's effectiveness for modern distributed databases.

Abstract

Modern distributed databases face challenges in achieving transactional consistency across distributed partitions. Traditional two-phase commit (2PC) protocols incur high coordination overhead and latency, and require complex recovery for dynamic partition transfers. This paper introduces a novel tree-shaped 2PC framework for OceanBase that leverages single-machine log streams to address these challenges through three innovations. First, we propose log streams as atomic participants, replacing partition-level coordination. By treating each log stream as the commit unit, a transaction spanning co-located partitions interacts with one participant, reducing coordination overhead by orders of magnitude (e.g., 99 percent reduction for ). Second, we design a tree-shaped 2PC protocol with coordinator-rooted DAG topology that dynamically handles partition transfers by recursively constructing commit trees. When a partition migrates during a transaction, the protocol embeds migration contexts as leaf nodes, eliminating explicit participant list updates, resolving circular dependencies, and ensuring linearizable commits under topology changes. Third, we introduce prepare-unknown and trans-unknown states to prevent consistency violations when participants lose context. These states signal uncertainty during retries, avoiding erroneous aborts from so-called lying participants while isolating users from ambiguity. Experimental evaluation demonstrates performance approaching that of single-machine transactions, with reduced latency and bandwidth consumption, validating the framework's effectiveness for modern distributed databases.
Paper Structure (73 sections, 5 theorems, 15 figures, 3 algorithms)

This paper contains 73 sections, 5 theorems, 15 figures, 3 algorithms.

Key Result

theorem 1

The log stream tree meets the minimum set requirement.

Figures (15)

  • Figure 1: Participant list changes during a transaction.
  • Figure 2: Log stream tree during execution.
  • Figure 3: Tree-shaped two-phase commit state machine. Solid lines represent normal processes; dotted lines represent abnormal timeouts (T) and downtime (F). /\\ represents AND, \\/ represents OR. A $\rightarrow$ B means that in the current state, input A generates output B and transitions to the next state.
  • Figure 4: Transfer during tree-shaped 2PC.
  • Figure 5: Log stream tree during execution.
  • ...and 10 more figures

Theorems & Definitions (5)

  • theorem 1
  • theorem 2: Transfer principle
  • theorem 3
  • theorem 4
  • theorem 5