Revisiting the Design of In-Memory Dynamic Graph Storage
Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, Minyi Guo
TL;DR
The paper revisits in-memory dynamic graph storage by proposing a common abstraction and a generic test framework to fairly compare diverse DGS methods. It benchmarks five DGS approaches against CSR/AdjLst across real-world and synthetic graphs, revealing substantial memory overhead and contention in fine-grained methods, while coarse-grained approaches like Aspen mitigate some concurrency costs but at memory or scalability trade-offs. Key findings show dynamic arrays excel for vertex access, segmented neighbor indexes boost insert performance on large neighbor sets, and adaptive indexing helps in real-world graphs, yet a persistent gap remains between DGS and static CSR in read efficiency and memory footprint. The work provides actionable design guidance and highlights hardware-aware optimizations as critical for practical real-time graph analytics.
Abstract
The effectiveness of in-memory dynamic graph storage (DGS) for supporting concurrent graph read and write queries is crucial for real-time graph analytics and updates. Various methods have been proposed, for example, LLAMA, Aspen, LiveGraph, Teseo, and Sortledton. These approaches differ significantly in their support for read and write operations, space overhead, and concurrency control. However, there has been no systematic study to explore the trade-offs among these dimensions. In this paper, we evaluate the effectiveness of individual techniques and identify the performance factors affecting these storage methods by proposing a common abstraction for DGS design and implementing a generic test framework based on this abstraction. Our findings highlight several key insights: 1) Existing DGS methods exhibit substantial space overhead. For example, Aspen consumes 3.3-10.8x more memory than CSR, while the optimal fine-grained methods consume 4.1-8.9x more memory than CSR, indicating a significant memory overhead. 2) Existing methods often overlook memory access impact of modern architectures, leading to performance degradation compared to continuous storage methods. 3) Fine-grained concurrency control methods, in particular, suffer from severe efficiency and space issues due to maintaining versions and performing checks for each neighbor. These methods also experience significant contention on high-degree vertices. Our systematic study reveals these performance bottlenecks and outlines future directions to improve DGS for real-time graph analytics.
