LSM Trees in Adversarial Environments
Hayder Tirmazi
TL;DR
The paper addresses the vulnerability of LSM-tree based KV stores to adversarial workloads that exploit Bloom Filter false positives, causing large increases in read latency for zero-result lookups. It formalizes adversarial models and introduces Smash-Lsm/Smash-Bloom games to quantify risks, then proposes a practical, provably secure mitigation based on keyed pseudorandom permutations (PRPs) that obfuscate key placement. The authors implement the protection in LevelDB and RocksDB, demonstrating substantial reductions in zero-result lookup latency ($up to 60\%$–$78\%$) with manageable overhead on queries involving existing keys, and provide security proofs and a clear set of open problems. The work highlights the importance of adversarial resilience in storage systems and offers a concrete, pluggable defense that preserves correctness while mitigating adversarial impact.
Abstract
The Log Structured Merge (LSM) Tree is a popular choice for key-value stores that focus on optimized write throughput while maintaining performant, production-ready read latencies. To optimize read performance, LSM stores rely on a probabilistic data structure called the Bloom Filter (BF). In this paper, we focus on adversarial workloads that lead to a sharp degradation in read performance by impacting the accuracy of BFs used within the LSM store. Our evaluation shows up to $800\%$ increase in the read latency of lookups for popular LSM stores. We define adversarial models and security definitions for LSM stores. We implement adversary resilience into two popular LSM stores, LevelDB and RocksDB. We use our implementations to demonstrate how performance degradation under adversarial workloads can be mitigated.
