Table of Contents
Fetching ...

GaussDB-Global: A Geographically Distributed Database System

Puya Memarzia, Huaxin Zhang, Kelvin Ho, Ronen Grosman, Jiang Wang

TL;DR

The paper tackles latency and coordination bottlenecks in geo-distributed OLTP by introducing GaussDB-Global, a sharded system that blends asynchronous replication with a decentralized, clock-based transaction manager (GClock) and a bi-directional online transition (DUAL mode) between GTM and GClock. It also enables fast, consistency-guaranteed reads from asynchronous replicas via a Replica Consistency Point and a dynamic Read-On-Replica (ROR) node selection strategy. Key contributions include the GClock timestamping mechanism, the DUAL-mode online migration protocol, the RCP-based replica reads, and performance gains demonstrated in multi-city experiments (up to 14x read throughput and 50% higher TPC-C throughput over a baseline). These approaches collectively deliver high availability and performance for geo-distributed workloads without requiring downtime or sacrificing existing workloads.

Abstract

Geographically distributed database systems use remote replication to protect against regional failures. These systems are sensitive to severe latency penalties caused by centralized transaction management, remote access to sharded data, and log shipping over long distances. To tackle these issues, we present GaussDB-Global, a sharded geographically distributed database system with asynchronous replication, for OLTP applications. To tackle the transaction management bottleneck, we take a decentralized approach using synchronized clocks. Our system can seamlessly transition between centralized and decentralized transaction management, providing efficient fault tolerance and streamlining deployment. To alleviate the remote read and log shipping issues, we support reads on asynchronous replicas with strong consistency, tunable freshness guarantees, and dynamic load balancing. Our experimental results on a geographically distributed cluster show that our approach provides up to 14x higher read throughput, and 50% more TPC-C throughput compared to our baseline.

GaussDB-Global: A Geographically Distributed Database System

TL;DR

The paper tackles latency and coordination bottlenecks in geo-distributed OLTP by introducing GaussDB-Global, a sharded system that blends asynchronous replication with a decentralized, clock-based transaction manager (GClock) and a bi-directional online transition (DUAL mode) between GTM and GClock. It also enables fast, consistency-guaranteed reads from asynchronous replicas via a Replica Consistency Point and a dynamic Read-On-Replica (ROR) node selection strategy. Key contributions include the GClock timestamping mechanism, the DUAL-mode online migration protocol, the RCP-based replica reads, and performance gains demonstrated in multi-city experiments (up to 14x read throughput and 50% higher TPC-C throughput over a baseline). These approaches collectively deliver high availability and performance for geo-distributed workloads without requiring downtime or sacrificing existing workloads.

Abstract

Geographically distributed database systems use remote replication to protect against regional failures. These systems are sensitive to severe latency penalties caused by centralized transaction management, remote access to sharded data, and log shipping over long distances. To tackle these issues, we present GaussDB-Global, a sharded geographically distributed database system with asynchronous replication, for OLTP applications. To tackle the transaction management bottleneck, we take a decentralized approach using synchronized clocks. Our system can seamlessly transition between centralized and decentralized transaction management, providing efficient fault tolerance and streamlining deployment. To alleviate the remote read and log shipping issues, we support reads on asynchronous replicas with strong consistency, tunable freshness guarantees, and dynamic load balancing. Our experimental results on a geographically distributed cluster show that our approach provides up to 14x higher read throughput, and 50% more TPC-C throughput compared to our baseline.
Paper Structure (14 sections, 3 equations, 6 figures)

This paper contains 14 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Geo-distributed Database Overview
  • Figure 2: GTM to GClock Transition using DUAL mode
  • Figure 3: GClock to GTM Transition using DUAL mode
  • Figure 4: Replica Consistency Point Calculation
  • Figure 5: ROR dynamic node selection using skyline
  • ...and 1 more figures