Table of Contents
Fetching ...

A survey of LSM-Tree based Indexes, Data Systems and KV-stores

Supriya Mishra

TL;DR

This survey analyzes the state-of-the-art in LSM-Tree based indexes, data systems, and KV-stores, with a focus on how emerging persistent memory (PM) technologies reshape design trade-offs. It covers fundamental LSM-Tree structure (memtable, SSTables, WAL, compaction), PM characteristics and challenges, and a broad set of systems and accelerators (e.g., NV-cache, NoveLSM, TLSM) that leverage PM or disaggregated memory. The contribution lies in compiling and contrasting architectures, identifying gaps (notably secondary indexes and learned indexes), and outlining directions such as PM-enabled optimizations and new memory hierarchies. The findings highlight PM’s potential to significantly impact write performance, durability, and scalability of LSM-Tree based systems, while also underscoring the need for robust consistency and efficient IO across memory tiers.

Abstract

Modern databases typically makes use of the Log Structured Merge-Tree for organizing data in indexes, which is a kind of disk-based data structure. It was proposed to efficiently handle frequent update queries (also called update intensive workloads) databases. In recent years, LSM-Tree has gained popularity and has been adopted by a number of NoSql databases, and key-value stores. Since LSM-Tree was first proposed, researchers and the database community started efforts to improve different components of LSM-Tree. In recent years, Non-volatile Memory, also called Persistent Memory, has also gained significant popularity. This is a class of memory that is non-volatile and byte-addressable at the same time, and hence also termed Storage Class Memory. Apart from that, storage class memory exhibits the combination of the best characteristics of both memory and storage. An overview of the current state of the art in LSM-Tree-based indexes, data systems, and Key-Value stores is provided in this paper.

A survey of LSM-Tree based Indexes, Data Systems and KV-stores

TL;DR

This survey analyzes the state-of-the-art in LSM-Tree based indexes, data systems, and KV-stores, with a focus on how emerging persistent memory (PM) technologies reshape design trade-offs. It covers fundamental LSM-Tree structure (memtable, SSTables, WAL, compaction), PM characteristics and challenges, and a broad set of systems and accelerators (e.g., NV-cache, NoveLSM, TLSM) that leverage PM or disaggregated memory. The contribution lies in compiling and contrasting architectures, identifying gaps (notably secondary indexes and learned indexes), and outlining directions such as PM-enabled optimizations and new memory hierarchies. The findings highlight PM’s potential to significantly impact write performance, durability, and scalability of LSM-Tree based systems, while also underscoring the need for robust consistency and efficient IO across memory tiers.

Abstract

Modern databases typically makes use of the Log Structured Merge-Tree for organizing data in indexes, which is a kind of disk-based data structure. It was proposed to efficiently handle frequent update queries (also called update intensive workloads) databases. In recent years, LSM-Tree has gained popularity and has been adopted by a number of NoSql databases, and key-value stores. Since LSM-Tree was first proposed, researchers and the database community started efforts to improve different components of LSM-Tree. In recent years, Non-volatile Memory, also called Persistent Memory, has also gained significant popularity. This is a class of memory that is non-volatile and byte-addressable at the same time, and hence also termed Storage Class Memory. Apart from that, storage class memory exhibits the combination of the best characteristics of both memory and storage. An overview of the current state of the art in LSM-Tree-based indexes, data systems, and Key-Value stores is provided in this paper.
Paper Structure (9 sections, 1 figure, 2 tables)