Table of Contents
Fetching ...

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li, Qian Tao, Song Yu, Shufeng Gong, Yanfeng Zhang, Feng Yao, Wenyuan Yu, Ge Yu, Jingren Zhou

TL;DR

GastCoCo addresses the dual need for fast graph computation and rapid dynamic updates by co-designing a prefetch-aware storage structure (CBList) with coroutine-based software prefetching to hide memory latency. It introduces GTChain to enable hardware prefetching during sequential traversals and a set of adaptive strategies in an adaptation layer to auto-tune partitioning, scheduling, and prefetching. The key contributions include CBList, GTChain, interleaved execution with stackless C++20 coroutines, and a family of hybrid prefetching strategies that together deliver substantial gains over state-of-the-art systems across queries, analytics, and updates. Empirical results show GastCoCo achieving up to 180x improvements in updates and up to 41x in graph computation on real-world workloads, demonstrating strong performance portability across diverse graphs and hardware.

Abstract

An efficient data structure is fundamental to meeting the growing demands in dynamic graph processing. However, the dual requirements for graph computation efficiency (with contiguous structures) and graph update efficiency (with linked list-like structures) present a conflict in the design principles of graph structures. After experimental studies of existing state-of-the-art dynamic graph structures, we observe that the overhead of cache misses accounts for a major portion of the graph computation time. This paper presents GastCoCo, a system with graph storage and coroutine-based prefetch co-design. By employing software prefetching via stackless coroutines and introducing a prefetch-friendly data structure CBList, GastCoCo significantly alleviates the performance degradation caused by cache misses. Our results show that GastCoCo outperforms state-of-the-art graph storage systems by 1.3x - 180x in graph updates and 1.4x - 41.1x in graph computation.

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

TL;DR

GastCoCo addresses the dual need for fast graph computation and rapid dynamic updates by co-designing a prefetch-aware storage structure (CBList) with coroutine-based software prefetching to hide memory latency. It introduces GTChain to enable hardware prefetching during sequential traversals and a set of adaptive strategies in an adaptation layer to auto-tune partitioning, scheduling, and prefetching. The key contributions include CBList, GTChain, interleaved execution with stackless C++20 coroutines, and a family of hybrid prefetching strategies that together deliver substantial gains over state-of-the-art systems across queries, analytics, and updates. Empirical results show GastCoCo achieving up to 180x improvements in updates and up to 41x in graph computation on real-world workloads, demonstrating strong performance portability across diverse graphs and hardware.

Abstract

An efficient data structure is fundamental to meeting the growing demands in dynamic graph processing. However, the dual requirements for graph computation efficiency (with contiguous structures) and graph update efficiency (with linked list-like structures) present a conflict in the design principles of graph structures. After experimental studies of existing state-of-the-art dynamic graph structures, we observe that the overhead of cache misses accounts for a major portion of the graph computation time. This paper presents GastCoCo, a system with graph storage and coroutine-based prefetch co-design. By employing software prefetching via stackless coroutines and introducing a prefetch-friendly data structure CBList, GastCoCo significantly alleviates the performance degradation caused by cache misses. Our results show that GastCoCo outperforms state-of-the-art graph storage systems by 1.3x - 180x in graph updates and 1.4x - 41.1x in graph computation.
Paper Structure (26 sections, 16 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 16 figures, 3 tables, 3 algorithms.

Figures (16)

  • Figure 1: The execution time for graph algorithms (and graph updates) and the CPU cache stall count on different data structures (T.O.: graph updates cannot finish in 24 hours).
  • Figure 2: Hardware prefetching.
  • Figure 3: Overview of GastCoCo.
  • Figure 4: Prefetch-aware structure CBList. (The capacities of small chunks and B+ tree nodes are set as 2 and 3 edges.)
  • Figure 5: (a) CSR-like physically contiguous store; (b) prefetch-friendly logically contiguous store via pointers; (c) ADJ-like mutually independent store (E.S.: Edge Storage).
  • ...and 11 more figures