Table of Contents
Fetching ...

An Adaptive Hotspot-Aware Index for Oscillating Write-Heavy and Read-Heavy Workloads

Lu Xing, Ruihong Wang, Walid G. Aref

TL;DR

This work tackles oscillating HTAP workloads that alternate between write-heavy ingestion and read-heavy analytics. It introduces the Adaptive Hotspot-Aware Tree (AHA-tree), a bi-directional adaptive index that blends an LSM-tree–like component with a B+-tree–like tree, focusing adaptation on hotspot data. The index operates in multiple states (AHA-W^0, AHA-R, AHA-W^+) that adapt to workload shifts, with background processes enabling hotspot-aware transitions and guarded compaction to reduce write amplification. Empirical results show AHA-tree matches or surpasses LevelDB during writes and converges to B+-tree read performance during analytics, even under multiple hotspots and hotspot drift, while maintaining reasonable adaptation times and concurrency. The approach offers a practical path to efficient HTAP systems by delivering fast writes during ingestion and robust reads during analytics through hotspot-focused, bidirectional adaptation.

Abstract

HTAP systems are designed to handle transactional and analytical workloads. Besides a mixed workload at any given time, the workload can also change over time. A popular type of continuously changing workload is one that oscillates between being write-heavy at times and being read-heavy at other times. Oscillating workloads can be observed in many applications. Indexes, e.g., the B+-tree and the LSM-tree, cannot perform equally well all the time. Conventional adaptive indexing does not solve this issue as it focuses on adapting in one direction. This paper studies how to support oscillating workloads with adaptive indexes that adapt the underlying index structures in both directions. With the observation that real-world datasets are skewed, the focus is to optimize the index within the hotspot regions. The Adaptive Hotspot-Aware Tree (or AHA-tree, for short) is introduced, where its adaptation is bi-directional. Experimental evaluation show that AHA-tree can behave competitively as compared to an LSM-tree for write-heavy transactional workloads. Upon switching to a read-heavy analytical workload, AHA-tree can gradually adapt and behave competitively, and can match the B+-tree in read performance.

An Adaptive Hotspot-Aware Index for Oscillating Write-Heavy and Read-Heavy Workloads

TL;DR

This work tackles oscillating HTAP workloads that alternate between write-heavy ingestion and read-heavy analytics. It introduces the Adaptive Hotspot-Aware Tree (AHA-tree), a bi-directional adaptive index that blends an LSM-tree–like component with a B+-tree–like tree, focusing adaptation on hotspot data. The index operates in multiple states (AHA-W^0, AHA-R, AHA-W^+) that adapt to workload shifts, with background processes enabling hotspot-aware transitions and guarded compaction to reduce write amplification. Empirical results show AHA-tree matches or surpasses LevelDB during writes and converges to B+-tree read performance during analytics, even under multiple hotspots and hotspot drift, while maintaining reasonable adaptation times and concurrency. The approach offers a practical path to efficient HTAP systems by delivering fast writes during ingestion and robust reads during analytics through hotspot-focused, bidirectional adaptation.

Abstract

HTAP systems are designed to handle transactional and analytical workloads. Besides a mixed workload at any given time, the workload can also change over time. A popular type of continuously changing workload is one that oscillates between being write-heavy at times and being read-heavy at other times. Oscillating workloads can be observed in many applications. Indexes, e.g., the B+-tree and the LSM-tree, cannot perform equally well all the time. Conventional adaptive indexing does not solve this issue as it focuses on adapting in one direction. This paper studies how to support oscillating workloads with adaptive indexes that adapt the underlying index structures in both directions. With the observation that real-world datasets are skewed, the focus is to optimize the index within the hotspot regions. The Adaptive Hotspot-Aware Tree (or AHA-tree, for short) is introduced, where its adaptation is bi-directional. Experimental evaluation show that AHA-tree can behave competitively as compared to an LSM-tree for write-heavy transactional workloads. Upon switching to a read-heavy analytical workload, AHA-tree can gradually adapt and behave competitively, and can match the B+-tree in read performance.
Paper Structure (49 sections, 15 figures, 2 tables)

This paper contains 49 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Performance of AHA-tree (with optimization mentioned in Section \ref{['sssection:potential_opt']}) and baselines under oscillating write- and read-heavy workloads.
  • Figure 2: The basic structure of AHA-tree and the life cycle of AHA-tree.
  • Figure 3: The structure of AHA-W$^0$ and the supported operations. The figure legends are on the top left. AHA-W$^0$ is a three-level tree structure covering from $-\infty$ to $\infty$. The nodes S1 and S2 are the results of splitting N1 when key 13 is added. A range query example is shown in the shaded text box. The Node-Emptying process of N3 sends four files to children nodes and removes them from N3's nodeLSM-tree.
  • Figure 4: Performance of AHA-W$^0$ and baselines for oscillating write- and read-heavy workloads of uniform (left) and Zipfian (right) data. AHA-W$^0$ and LSM-tree have similar throughput for uniform data under all operations, but AHA-W$^0$ shows lower throughput for Zipfian data during read-heavy-phases.
  • Figure 5: AHA-R structure. The hotspot range is 55-69. Both root and N2 nodeLSM-trees do not contain the hotspot data.
  • ...and 10 more figures