Aster: Enhancing LSM-structures for Scalable Graph Database
Dingheng Mo, Junfeng Liu, Fan Wang, Siqiang Luo
TL;DR
This work tackles the challenge of efficiently storing and querying large, evolving graphs on disk. It introduces Poly-LSM, a graph-oriented LSM-tree that blends vertex-based and edge-based layouts through multiple entry types (pivot and delta) and an adaptive update mechanism guided by a derived I/O cost model, complemented by space-efficient encoding via partitioned Elias-Fano and a degree sketch for degree-based decisions. Building on Poly-LSM, the authors implement Aster, a Gremlin-enabled graph database with MVCC support and Gremlin/TinkerPop integration, achieving robust, scalable performance across diverse real-world and property-graph workloads. Empirical results show Aster outperforming mainstream baselines on large-scale graphs (e.g., up to 17x throughput gains on Twitter-scale data), while maintaining better stability under workload shifts due to adaptive updates. Overall, the work demonstrates that a graph-oriented, adaptive LSM-storage engine can deliver substantial gains in update and lookup efficiency for disk-resident graphs, with practical impact for contemporary graph-backed applications.
Abstract
There is a proliferation of applications requiring the management of large-scale, evolving graphs under workloads with intensive graph updates and lookups. Driven by this challenge, we introduce Poly-LSM, a high-performance key-value storage engine for graphs with the following novel techniques: (1) Poly-LSM is embedded with a new design of graph-oriented LSM-tree structure that features a hybrid storage model for concisely and effectively storing graph data. (2) Poly-LSM utilizes an adaptive mechanism to handle edge insertions and deletions on graphs with optimized I/O efficiency. (3) Poly-LSM exploits the skewness of graph data to encode the key-value entries. Building upon this foundation, we further implement Aster, a robust and versatile graph database that supports Gremlin query language facilitating various graph applications. In our experiments, we compared Aster against several mainstream real-world graph databases. The results demonstrate that Aster outperforms all baseline graph databases, especially on large-scale graphs. Notably, on the billion-scale Twitter graph dataset, Aster achieves up to 17x throughput improvement compared to the best-performing baseline graph system.
