Table of Contents
Fetching ...

HotRAP: Hot Record Retention and Promotion for LSM-trees with Tiered Storage

Jiansheng Qiu, Fangzhou Yuan, Mingyu Gao, Huanchen Zhang

TL;DR

HotRAP tackles read-hot data in tiered-LSM-trees by introducing an on-disk hotness tracker (RALT) and record-level retention/promotion across three pathways to keep hot records in fast storage. The system leverages a mutable/immutable promotion cache and on-disk promotion tracking to promote hot keys independently of standard compactions, enabling timely promotion and retention. Empirical results show up to 5.4x speedups on read-only YCSB, 3.8x on read-write-balanced YCSB, and 1.9x on Twitter traces, with overheads under 1% in uniform workloads and robustness to hotspot shifts. These contributions provide a fine-grained, low-overhead mechanism to improve read performance and reduce storage costs in large-scale KV stores using tiered LSM-trees.

Abstract

The multi-level design of Log-Structured Merge-trees (LSM-trees) naturally fits the tiered storage architecture: the upper levels (recently inserted/updated records) are kept in fast storage to guarantee performance while the lower levels (the majority of records) are placed in slower but cheaper storage to reduce cost. However, frequently accessed records may have been compacted and reside in slow storage. Existing algorithms are inefficient in promoting these ``hot'' records to fast storage, leading to compromised read performance. We present HotRAP, a key-value store based on RocksDB that can timely promote hot records individually from slow to fast storage and keep them in fast storage while they are hot. HotRAP uses an on-disk data structure (a specially-made LSM-tree) to track the hotness of keys and includes three pathways to ensure that hot records reach fast storage with short delays. Our experiments show that HotRAP outperforms state-of-the-art LSM-trees on tiered storage by up to 5.4$\times$ compared to the second best under read-only and read-write-balanced YCSB workloads with common access skew patterns, and by up to 1.9$\times$ compared to the second best under Twitter production workloads.

HotRAP: Hot Record Retention and Promotion for LSM-trees with Tiered Storage

TL;DR

HotRAP tackles read-hot data in tiered-LSM-trees by introducing an on-disk hotness tracker (RALT) and record-level retention/promotion across three pathways to keep hot records in fast storage. The system leverages a mutable/immutable promotion cache and on-disk promotion tracking to promote hot keys independently of standard compactions, enabling timely promotion and retention. Empirical results show up to 5.4x speedups on read-only YCSB, 3.8x on read-write-balanced YCSB, and 1.9x on Twitter traces, with overheads under 1% in uniform workloads and robustness to hotspot shifts. These contributions provide a fine-grained, low-overhead mechanism to improve read performance and reduce storage costs in large-scale KV stores using tiered LSM-trees.

Abstract

The multi-level design of Log-Structured Merge-trees (LSM-trees) naturally fits the tiered storage architecture: the upper levels (recently inserted/updated records) are kept in fast storage to guarantee performance while the lower levels (the majority of records) are placed in slower but cheaper storage to reduce cost. However, frequently accessed records may have been compacted and reside in slow storage. Existing algorithms are inefficient in promoting these ``hot'' records to fast storage, leading to compromised read performance. We present HotRAP, a key-value store based on RocksDB that can timely promote hot records individually from slow to fast storage and keep them in fast storage while they are hot. HotRAP uses an on-disk data structure (a specially-made LSM-tree) to track the hotness of keys and includes three pathways to ensure that hot records reach fast storage with short delays. Our experiments show that HotRAP outperforms state-of-the-art LSM-trees on tiered storage by up to 5.4 compared to the second best under read-only and read-write-balanced YCSB workloads with common access skew patterns, and by up to 1.9 compared to the second best under Twitter production workloads.
Paper Structure (20 sections, 14 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: The high-level picture of HotRAP. RALT is a small LSM-tree in FD that tracks the hotness of keys. mPC and immPC stand for the mutable and immutable promotion caches. Hot records are retained in FD during compactions. Records accessed in SD are inserted into the promotion cache and then promoted by compaction or flush if they are hot.
  • Figure 2: Overview of HotRAP. mPC and immPC stand for the mutable and immutable promotion caches. Solid arrows are data flow. Dashed arrows are control flow. The accessed keys in SD are firstly inserted into the mutable promotion cache (① to ②). Hot records are retained in FD during compactions (③ to ⑤). A compaction can piggyback hot records in its range to FD (⑥ to ⑨). If the mutable promotion cache becomes full, hot records in it will be flushed to Level 0 (ⓐ to ⓔ).
  • Figure 3: RALT structure. Suppose key user12345, whose value length is 200B, is accessed in HotRAP and an access record is inserted to RALT. The figure shows the RALT access record format, as well as the four supported operations (1) to (4). The current time slice sequence number for exponential smoothing is 12. The HotRAP size is len(user12345) + 200 = 209 bytes. The physical size is $(9 + 4) + (4\times 3) = 25$ bytes, where we use 4 bytes for the length of the key, and 4 bytes each for the value length, tick, and score.
  • Figure 4: Eviction in RALT. The length of records represents their size (HotRAP size or physical size). The sampled points are shown as stars in this virtual size space.
  • Figure 5: Concurrency control of promotion by flush. Processes with lock icons are protected by the DB mutex lock. ⑧ ensures that no newer versions exist in the snapshot of the LSM-tree. ⓐ to ⓒ insert all updated keys in immutable promotion caches into their updated fields. Records with updated keys are excluded in ⑨. The snapshot is taken (④) after the creation of the immutable promotion cache (③), therefore, a key updated before ⑨ is either found out by ⑧ or by ⓐ--ⓒ.
  • ...and 9 more figures