CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs
Zhuochen Fan, Yalun Cai, Zirui Liu, Jiarui Guo, Xin Fan, Tong Yang, Bin Cui
TL;DR
CuckooGraph addresses the core challenges of large-scale dynamic graphs—rapid updates, massive scale, and complex queries—by adopting a transformable, hash-based storage that uses Large/Small Cuckoo Hash Tables (L-CHT/S-CHT) and a Denylist mechanism. The transformable data structures enable adaptive space reuse, while the Denylist mitigates insertion failures without frequent reconfigurations. The authors provide theoretical time and memory analyses and demonstrate substantial empirical gains over state-of-the-art baselines across insertion, query, and graph analytics tasks, with additional practicality through Redis and Neo4j integrations. The work offers a practical, scalable approach to dynamic graph storage with strong performance guarantees and broad applicability in real-world data systems.
Abstract
Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not require any prior knowledge of the upcoming graphs, and can adaptively resize to the most memory-efficient form while requiring few memory accesses for very fast graph data processing. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. Our experimental results show that compared with the most competitive solution Spruce, CuckooGraph achieves about $33\times$ higher insertion throughput while requiring only about $68\%$ of the memory space.
