Table of Contents
Fetching ...

LSM Trees in Adversarial Environments

Hayder Tirmazi

TL;DR

The paper addresses the vulnerability of LSM-tree based KV stores to adversarial workloads that exploit Bloom Filter false positives, causing large increases in read latency for zero-result lookups. It formalizes adversarial models and introduces Smash-Lsm/Smash-Bloom games to quantify risks, then proposes a practical, provably secure mitigation based on keyed pseudorandom permutations (PRPs) that obfuscate key placement. The authors implement the protection in LevelDB and RocksDB, demonstrating substantial reductions in zero-result lookup latency ($up to 60\%$–$78\%$) with manageable overhead on queries involving existing keys, and provide security proofs and a clear set of open problems. The work highlights the importance of adversarial resilience in storage systems and offers a concrete, pluggable defense that preserves correctness while mitigating adversarial impact.

Abstract

The Log Structured Merge (LSM) Tree is a popular choice for key-value stores that focus on optimized write throughput while maintaining performant, production-ready read latencies. To optimize read performance, LSM stores rely on a probabilistic data structure called the Bloom Filter (BF). In this paper, we focus on adversarial workloads that lead to a sharp degradation in read performance by impacting the accuracy of BFs used within the LSM store. Our evaluation shows up to $800\%$ increase in the read latency of lookups for popular LSM stores. We define adversarial models and security definitions for LSM stores. We implement adversary resilience into two popular LSM stores, LevelDB and RocksDB. We use our implementations to demonstrate how performance degradation under adversarial workloads can be mitigated.

LSM Trees in Adversarial Environments

TL;DR

The paper addresses the vulnerability of LSM-tree based KV stores to adversarial workloads that exploit Bloom Filter false positives, causing large increases in read latency for zero-result lookups. It formalizes adversarial models and introduces Smash-Lsm/Smash-Bloom games to quantify risks, then proposes a practical, provably secure mitigation based on keyed pseudorandom permutations (PRPs) that obfuscate key placement. The authors implement the protection in LevelDB and RocksDB, demonstrating substantial reductions in zero-result lookup latency () with manageable overhead on queries involving existing keys, and provide security proofs and a clear set of open problems. The work highlights the importance of adversarial resilience in storage systems and offers a concrete, pluggable defense that preserves correctness while mitigating adversarial impact.

Abstract

The Log Structured Merge (LSM) Tree is a popular choice for key-value stores that focus on optimized write throughput while maintaining performant, production-ready read latencies. To optimize read performance, LSM stores rely on a probabilistic data structure called the Bloom Filter (BF). In this paper, we focus on adversarial workloads that lead to a sharp degradation in read performance by impacting the accuracy of BFs used within the LSM store. Our evaluation shows up to increase in the read latency of lookups for popular LSM stores. We define adversarial models and security definitions for LSM stores. We implement adversary resilience into two popular LSM stores, LevelDB and RocksDB. We use our implementations to demonstrate how performance degradation under adversarial workloads can be mitigated.

Paper Structure

This paper contains 23 sections, 3 theorems, 5 equations, 5 figures, 1 table.

Key Result

Theorem 4.1

Let $\Lambda = (C_{r}, I_{r}, Q)$ be an LSM store using $m$ bits of memory for its Bloom Filters. If pseudo-random permutations exist, then there exists a negligible function $\mathrm{negl(\cdot)}$ such that for security parameter $\lambda$, there exists an LSM engine $\Lambda^{\prime}$ that is $(n,

Figures (5)

  • Figure 1: Performance degradation of zero-result lookups on a uniformly random query workload
  • Figure 2: Bloom Filter example ($m_{\dagger}$ = 8, $k_{\dagger}$ = 2) adapted from hayder_2024. Inserted $x_{i}$ is hashed $k_{\dagger}$ times, setting mapped bits. Queried $y_{i}$ is hashed $k$ times. If a mapped bit is unset, $y_{i} \notin S$. Otherwise $y_{i}$ is in $S$ or a false positive
  • Figure 3: Time taken by a brute force algorithm running sequentially on a local machine to saturate LevelDB's Bloom Filter implementation with various memory budgets.
  • Figure 4: Zero-result lookup performance on a uniformly random query workload for LevelDB ( left) and RocksDB (right) with adversarial resilience.
  • Figure 5: Existing query lookup performance on a uniformly random query workload for LevelDB (left) and RocksDB (right) with adversarial resilience.

Theorems & Definitions (10)

  • Definition 3.1: Bloom Filter
  • Definition 3.2
  • Theorem 4.1
  • Definition 5.1
  • Definition 5.2
  • Definition 5.3
  • Definition 5.4
  • Theorem 5.1
  • Theorem 5.2
  • Definition A.1: Adversary Resilient LSM store