SynchroStore: A Cost-Based Fine-Grained Incremental Compaction for Hybrid Workloads
Yinan Zhang, Huiqi Hu, Xuan Zhou
TL;DR
SynchroStore tackles the update bottleneck in columnar storage under HTAP-like workloads by integrating an in-memory incremental row store with a post-freeze columnar storage path within an LSM-Tree framework. It introduces a four-layer architecture, a transition layer with column buckets for fine-grained compaction, a fine-grained row-to-column conversion strategy, and a cost-based scheduling model that uses per-operator costs and a correction factor to estimate execution duration via $Duration_i = Cost_i * phi_i$. Reads operate on consistent snapshots, enabled by MVCC with multi-version deletes and Bloom-filter acceleration, ensuring updates do not block analytical queries. Experimental results show substantial improvements in update performance compared with existing columnar engines (e.g., DuckDB) and meaningful gains in tail latency under mixed workloads when employing the scheduling mechanism, demonstrating the practical viability of balancing real-time updates with high-throughput analytics in hybrid environments.
Abstract
This study proposes a novel storage engine, SynchroStore, designed to address the inefficiency of update operations in columnar storage systems based on Log-Structured Merge Trees (LSM-Trees) under hybrid workload scenarios. While columnar storage formats demonstrate significant query performance advantages when handling large-scale datasets, traditional columnar storage systems face challenges such as high update complexity and poor real-time performance in data-intensive applications. SynchroStore introduces an incremental row storage mechanism and a fine-grained row-to-column transformation and compaction strategy, effectively balancing data update efficiency and query performance. The storage system employs an in-memory row storage structure to support efficient update operations, and the data is converted to a columnar format after freezing to support high-performance read operations. The core innovations of SynchroStore are reflected in the following aspects:(1) the organic combination of incremental row storage and columnar storage; (2) a fine-grained row-to-column transformation and compaction mechanism; (3) a cost-based scheduling strategy. These innovative features allow SynchroStore to leverage background computational resources for row-to-column transformation and compaction operations, while ensuring query performance is unaffected, thus effectively solving the update performance bottleneck of columnar storage under hybrid workloads. Experimental evaluation results show that, compared to existing columnar storage systems like DuckDB, SynchroStore exhibits significant advantages in update performance under hybrid workloads.
