Table of Contents
Fetching ...

Building an OceanBase-based Distributed Nearly Real-time Analytical Processing Database System

Quanqing Xu, Chuanhui Yang, Ruijie Li, Dongdong Xie, Hui Cao, Yi Xiao, Junquan Chen, Yanzuo Wang, Saitong Zhao, Fusheng Han, Bin Liu, Guoping Wang, Yuzhong Zhao, Mingqiang Zhuang

TL;DR

OceanBase Mercury tackles the challenge of delivering near real-time analytics on petabyte-scale data within a unified HTAP framework. It advances a hybrid storage model that combines a columnar baseline with row-based incremental data, enabled by an adaptive compaction strategy, data-skipping indices, and a TP/AP-aware vectorized execution engine with three data formats. The system also introduces efficient materialized-view refresh through full and incremental mechanisms and demonstrates substantial performance gains, including 1.3X–3.1X latency improvements over specialized OLAP engines and favorable comparisons against StarRocks, ClickHouse, and Doris on representative workloads. These innovations collectively offer a scalable, high-availability solution that preserves OLTP performance while delivering robust, low-latency analytical processing for modern enterprise data workloads.

Abstract

The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.

Building an OceanBase-based Distributed Nearly Real-time Analytical Processing Database System

TL;DR

OceanBase Mercury tackles the challenge of delivering near real-time analytics on petabyte-scale data within a unified HTAP framework. It advances a hybrid storage model that combines a columnar baseline with row-based incremental data, enabled by an adaptive compaction strategy, data-skipping indices, and a TP/AP-aware vectorized execution engine with three data formats. The system also introduces efficient materialized-view refresh through full and incremental mechanisms and demonstrates substantial performance gains, including 1.3X–3.1X latency improvements over specialized OLAP engines and favorable comparisons against StarRocks, ClickHouse, and Doris on representative workloads. These innovations collectively offer a scalable, high-availability solution that preserves OLTP performance while delivering robust, low-latency analytical processing for modern enterprise data workloads.

Abstract

The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.
Paper Structure (40 sections, 17 figures, 4 tables)

This paper contains 40 sections, 17 figures, 4 tables.

Figures (17)

  • Figure 1: System Architecture
  • Figure 2: Hybrid Storage Architecture
  • Figure 3: Row store and column store integration
  • Figure 4: Full refresh
  • Figure 5: The overall framework of incremental update
  • ...and 12 more figures