MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores
Alireza Heidari, Amirhossein Ahmadi, Zefeng Zhi, Wei Zhang
TL;DR
MetaHive tackles metadata management in heterogeneous cloud KV stores by disaggregating per-key metadata into separate KVs that sit adjacent to their corresponding data (K and K^+), enabling cache-friendly access and privacy across diverse nodes. The design defines metadata key generation using a Start of Heading marker and builds a metadata payload with hashed components, then co-locates KV and checksum data within the same SST block to avoid extra memory reads. It introduces a one-pass error detection module during compaction and a repair workflow, along with mechanisms to operate across heterogeneous clusters with tombstone-based cleanup to preserve backward/forward compatibility. Experimental evaluation on RocksDB with YCSB workloads shows negligible GET overhead and minimal PUT overhead, and demonstrates robust behavior under heterogeneous deployment and fault injection. Overall, MetaHive offers scalable, privacy-preserving data integrity for heterogeneous KV clusters with minimal performance impact.
Abstract
Cloud key-value (KV) stores provide businesses with a cost-effective and adaptive alternative to traditional on-premise data management solutions. KV stores frequently consist of heterogeneous clusters, characterized by varying hardware specifications of the deployment nodes, with each node potentially running a distinct version of the KV store software. This heterogeneity is accompanied by the diverse metadata that they need to manage. In this study, we introduce MetaHive, a cache-optimized approach to managing metadata in heterogeneous KV store clusters. MetaHive disaggregates the original data from its associated metadata to promote independence between them, while maintaining their interconnection during usage. This makes the metadata opaque from the downstream processes and the other KV stores in the cluster. MetaHive also ensures that the KV and metadata entries are stored in the vicinity of each other in memory and storage. This allows MetaHive to optimally utilize the caching mechanism without extra storage read overhead for metadata retrieval. We deploy MetaHive to ensure data integrity in RocksDB and demonstrate its rapid data validation with minimal effect on performance.
