Table of Contents
Fetching ...

MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Alireza Heidari, Amirhossein Ahmadi, Zefeng Zhi, Wei Zhang

TL;DR

MetaHive tackles metadata management in heterogeneous cloud KV stores by disaggregating per-key metadata into separate KVs that sit adjacent to their corresponding data (K and K^+), enabling cache-friendly access and privacy across diverse nodes. The design defines metadata key generation using a Start of Heading marker and builds a metadata payload with hashed components, then co-locates KV and checksum data within the same SST block to avoid extra memory reads. It introduces a one-pass error detection module during compaction and a repair workflow, along with mechanisms to operate across heterogeneous clusters with tombstone-based cleanup to preserve backward/forward compatibility. Experimental evaluation on RocksDB with YCSB workloads shows negligible GET overhead and minimal PUT overhead, and demonstrates robust behavior under heterogeneous deployment and fault injection. Overall, MetaHive offers scalable, privacy-preserving data integrity for heterogeneous KV clusters with minimal performance impact.

Abstract

Cloud key-value (KV) stores provide businesses with a cost-effective and adaptive alternative to traditional on-premise data management solutions. KV stores frequently consist of heterogeneous clusters, characterized by varying hardware specifications of the deployment nodes, with each node potentially running a distinct version of the KV store software. This heterogeneity is accompanied by the diverse metadata that they need to manage. In this study, we introduce MetaHive, a cache-optimized approach to managing metadata in heterogeneous KV store clusters. MetaHive disaggregates the original data from its associated metadata to promote independence between them, while maintaining their interconnection during usage. This makes the metadata opaque from the downstream processes and the other KV stores in the cluster. MetaHive also ensures that the KV and metadata entries are stored in the vicinity of each other in memory and storage. This allows MetaHive to optimally utilize the caching mechanism without extra storage read overhead for metadata retrieval. We deploy MetaHive to ensure data integrity in RocksDB and demonstrate its rapid data validation with minimal effect on performance.

MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

TL;DR

MetaHive tackles metadata management in heterogeneous cloud KV stores by disaggregating per-key metadata into separate KVs that sit adjacent to their corresponding data (K and K^+), enabling cache-friendly access and privacy across diverse nodes. The design defines metadata key generation using a Start of Heading marker and builds a metadata payload with hashed components, then co-locates KV and checksum data within the same SST block to avoid extra memory reads. It introduces a one-pass error detection module during compaction and a repair workflow, along with mechanisms to operate across heterogeneous clusters with tombstone-based cleanup to preserve backward/forward compatibility. Experimental evaluation on RocksDB with YCSB workloads shows negligible GET overhead and minimal PUT overhead, and demonstrates robust behavior under heterogeneous deployment and fault injection. Overall, MetaHive offers scalable, privacy-preserving data integrity for heterogeneous KV clusters with minimal performance impact.

Abstract

Cloud key-value (KV) stores provide businesses with a cost-effective and adaptive alternative to traditional on-premise data management solutions. KV stores frequently consist of heterogeneous clusters, characterized by varying hardware specifications of the deployment nodes, with each node potentially running a distinct version of the KV store software. This heterogeneity is accompanied by the diverse metadata that they need to manage. In this study, we introduce MetaHive, a cache-optimized approach to managing metadata in heterogeneous KV store clusters. MetaHive disaggregates the original data from its associated metadata to promote independence between them, while maintaining their interconnection during usage. This makes the metadata opaque from the downstream processes and the other KV stores in the cluster. MetaHive also ensures that the KV and metadata entries are stored in the vicinity of each other in memory and storage. This allows MetaHive to optimally utilize the caching mechanism without extra storage read overhead for metadata retrieval. We deploy MetaHive to ensure data integrity in RocksDB and demonstrate its rapid data validation with minimal effect on performance.
Paper Structure (28 sections, 1 equation, 6 figures, 1 table)

This paper contains 28 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: RocksDB Architecture
  • Figure 2: RocksDB block-based SST format
  • Figure 3: Inserting checksum metadata on PUT operation
  • Figure 4: Clusters of KV and corresponding metadata
  • Figure 5: $DataIntegrity(KV,K^+V)$ the MetaHive error detection module for data integrity procedure.
  • ...and 1 more figures