Table of Contents
Fetching ...

HTAP Databases: A Survey

Chao Zhang, Guoliang Li, Jintao Zhang, Xinning Zhang, Jianhua Feng

TL;DR

HTAP databases integrate OLTP and OLAP in a single system, balancing data freshness and performance isolation across diverse architectures. The paper categorizes tightly-coupled HTAP designs into four storage-based architectures, surveys five core HTAP techniques, and reviews eight benchmarks, offering a structured view of current capabilities and trade-offs. It identifies open problems in data organization for distributed HTAP, optimized query planning, and cloud-native HTAP, while outlining opportunities such as ML-driven storage decisions and hybrid processing accelerators. Together, the taxonomy, technique deep-dive, benchmarks, and challenges provide a practical roadmap for selecting HTAP designs aligned with specific real-time analytics requirements and for guiding future research.

Abstract

Since Gartner coined the term, Hybrid Transactional and Analytical Processing (HTAP), numerous HTAP databases have been proposed to combine transactions with analytics in order to enable real-time data analytics for various data-intensive applications. HTAP databases typically process the mixed workloads of transactions and analytical queries in a unified system by leveraging both a row store and a column store. As there are different storage architectures and processing techniques to satisfy various requirements of diverse applications, it is critical to summarize the pros and cons of these key techniques. This paper offers a comprehensive survey of HTAP databases. We mainly classify state-of-the-art HTAP databases according to four storage architectures: (a) Primary Row Store and In-Memory Column Store; (b) Distributed Row Store and Column Store Replica; (c) Primary Row Store and Distributed In-Memory Column Store; and (d) Primary Column Store and Delta Row Store. We then review the key techniques in HTAP databases, including hybrid workload processing, data organization, data synchronization, query optimization, and resource scheduling. We also discuss existing HTAP benchmarks. Finally, we provide the research challenges and opportunities for HTAP techniques.

HTAP Databases: A Survey

TL;DR

HTAP databases integrate OLTP and OLAP in a single system, balancing data freshness and performance isolation across diverse architectures. The paper categorizes tightly-coupled HTAP designs into four storage-based architectures, surveys five core HTAP techniques, and reviews eight benchmarks, offering a structured view of current capabilities and trade-offs. It identifies open problems in data organization for distributed HTAP, optimized query planning, and cloud-native HTAP, while outlining opportunities such as ML-driven storage decisions and hybrid processing accelerators. Together, the taxonomy, technique deep-dive, benchmarks, and challenges provide a practical roadmap for selecting HTAP designs aligned with specific real-time analytics requirements and for guiding future research.

Abstract

Since Gartner coined the term, Hybrid Transactional and Analytical Processing (HTAP), numerous HTAP databases have been proposed to combine transactions with analytics in order to enable real-time data analytics for various data-intensive applications. HTAP databases typically process the mixed workloads of transactions and analytical queries in a unified system by leveraging both a row store and a column store. As there are different storage architectures and processing techniques to satisfy various requirements of diverse applications, it is critical to summarize the pros and cons of these key techniques. This paper offers a comprehensive survey of HTAP databases. We mainly classify state-of-the-art HTAP databases according to four storage architectures: (a) Primary Row Store and In-Memory Column Store; (b) Distributed Row Store and Column Store Replica; (c) Primary Row Store and Distributed In-Memory Column Store; and (d) Primary Column Store and Delta Row Store. We then review the key techniques in HTAP databases, including hybrid workload processing, data organization, data synchronization, query optimization, and resource scheduling. We also discuss existing HTAP benchmarks. Finally, we provide the research challenges and opportunities for HTAP techniques.
Paper Structure (50 sections, 5 equations, 6 figures, 3 tables)

This paper contains 50 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An Overview of HTAP Architectures, Techniques, Benchmarks
  • Figure 2: A Timeline of HTAP databases that first released the HTAP functionality in the literature
  • Figure 3: A trade-off between data freshness and performance isolation
  • Figure 4: A Taxonomy of State-Of-The-Art HTAP Databases based on the Storage Architecture and Processing Paradigm
  • Figure 5: Hybrid Processing based on Copy-on-Write and Dual-Store
  • ...and 1 more figures