Table of Contents
Fetching ...

Taurus Database: How to be Fast, Available, and Frugal in the Cloud

Alex Depoutovitch, Chong Chen, Jin Chen, Paul Larson, Shu Lin, Jack Ng, Wenlin Cui, Qiang Liu, Wei Huang, Yong Xiao, Yongjun He

TL;DR

Taurus presents a cloud-native relational DBaaS that separates compute and storage, using distinct Log Stores and Page Stores to optimize durability, availability, and performance. Its novel replication and recovery algorithm achieves high availability with only three replicas and separate log/page data paths, enabling fast writes and low latency reads via one network hop. The paper details the storage-layer design, including log-structured append-only storage, per-slice Page Stores, and the SAL coordination, and provides extensive experimental evaluation showing Taurus outperforming Amazon Aurora and Microsoft Socrates on multiple workloads, with low read-replica lag and scalable throughput. This work advances cloud DBaaS architectures by decoupling availability and durability and by optimizing storage interactions for log vs page data.

Abstract

Using cloud Database as a Service (DBaaS) offerings instead of on-premise deployments is increasingly common. Key advantages include improved availability and scalability at a lower cost than on-premise alternatives. In this paper, we describe the design of Taurus, a new multi-tenant cloud database system. Taurus separates the compute and storage layers in a similar manner to Amazon Aurora and Microsoft Socrates and provides similar benefits, such as read replica support, low network utilization, hardware sharing and scalability. However, the Taurus architecture has several unique advantages. Taurus offers novel replication and recovery algorithms providing better availability than existing approaches using the same or fewer replicas. Also, Taurus is highly optimized for performance, using no more than one network hop on critical paths and exclusively using append-only storage, delivering faster writes, reduced device wear, and constant-time snapshots. This paper describes Taurus and provides a detailed description and analysis of the storage node architecture, which has not been previously available from the published literature.

Taurus Database: How to be Fast, Available, and Frugal in the Cloud

TL;DR

Taurus presents a cloud-native relational DBaaS that separates compute and storage, using distinct Log Stores and Page Stores to optimize durability, availability, and performance. Its novel replication and recovery algorithm achieves high availability with only three replicas and separate log/page data paths, enabling fast writes and low latency reads via one network hop. The paper details the storage-layer design, including log-structured append-only storage, per-slice Page Stores, and the SAL coordination, and provides extensive experimental evaluation showing Taurus outperforming Amazon Aurora and Microsoft Socrates on multiple workloads, with low read-replica lag and scalable throughput. This work advances cloud DBaaS architectures by decoupling availability and durability and by optimizing storage interactions for log vs page data.

Abstract

Using cloud Database as a Service (DBaaS) offerings instead of on-premise deployments is increasingly common. Key advantages include improved availability and scalability at a lower cost than on-premise alternatives. In this paper, we describe the design of Taurus, a new multi-tenant cloud database system. Taurus separates the compute and storage layers in a similar manner to Amazon Aurora and Microsoft Socrates and provides similar benefits, such as read replica support, low network utilization, hardware sharing and scalability. However, the Taurus architecture has several unique advantages. Taurus offers novel replication and recovery algorithms providing better availability than existing approaches using the same or fewer replicas. Also, Taurus is highly optimized for performance, using no more than one network hop on critical paths and exclusively using append-only storage, delivering faster writes, reduced device wear, and constant-time snapshots. This paper describes Taurus and provides a detailed description and analysis of the storage node architecture, which has not been previously available from the published literature.

Paper Structure

This paper contains 29 sections, 2 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: MySQL with two replicas deployed in a cloud environment
  • Figure 2: Taurus components and layers
  • Figure 3: Taurus write path
  • Figure 4: Page Store recovery
  • Figure 5: Read replica workflow
  • ...and 7 more figures