PolarStore: High-Performance Data Compression for Large-Scale Cloud-Native Databases
Qingda Hu, Xinjun Yang, Feifei Li, Junru Li, Ya Lin, Yuqi Zhou, Yicong Zhu, Junwei Zhang, Rongbiao Xie, Ling Zhou, Bin Wu, Wenchao Zhou
TL;DR
PolarStore tackles the high storage costs of cloud-native RDBMSs by co-designing hardware and software to implement a dual-layer compression system that preserves I/O performance. The software layer produces 4 KB-aligned blocks, which are further compressed by in-storage hardware (PolarCSD), enabling byte-granularity indexing and flexible compression parameters. DB-oriented optimizations focus on redo-log bypass and adaptive page-read compression, plus a per-page log mechanism to mitigate tail latency, delivering strong space savings without sacrificing critical-path latency. Deployed across thousands of storage servers in PolarDB, PolarStore achieves a compression ratio of $3.55$ and around $60\%$ storage-cost reduction, with performance on par with uncompressed baselines, demonstrating practical viability at massive scale.
Abstract
In recent years, resource elasticity and cost optimization have become essential for RDBMSs. While cloud-native RDBMSs provide elastic computing resources via disaggregated computing and storage, storage costs remain a critical user concern. Consequently, data compression emerges as an effective strategy to reduce storage costs. However, existing compression approaches in RDBMSs present a stark trade-off: software-based approaches incur significant performance overheads, while hardware-based alternatives lack the flexibility required for diverse database workloads. In this paper, we present PolarStore, a compressed shared storage system for cloud-native RDBMSs. PolarStore employs a dual-layer compression mechanism that combines in-storage compression in PolarCSD hardware with lightweight compression in software. This design leverages the strengths of both approaches. PolarStore also incorporates database-oriented optimizations to maintain high performance on critical I/O paths. Drawing from large-scale deployment experiences, we also introduce hardware improvements for PolarCSD to ensure host-level stability and propose a compression-aware scheduling scheme to improve cluster-level space efficiency. PolarStore is currently deployed on thousands of storage servers within PolarDB, managing over 100 PB of data. It achieves a compression ratio of 3.55 and reduces storage costs by approximately 60%. Remarkably, these savings are achieved while maintaining performance comparable to uncompressed clusters.
