GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li; Qian Tao; Song Yu; Shufeng Gong; Yanfeng Zhang; Feng Yao; Wenyuan Yu; Ge Yu; Jingren Zhou

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li, Qian Tao, Song Yu, Shufeng Gong, Yanfeng Zhang, Feng Yao, Wenyuan Yu, Ge Yu, Jingren Zhou

TL;DR

GastCoCo addresses the dual need for fast graph computation and rapid dynamic updates by co-designing a prefetch-aware storage structure (CBList) with coroutine-based software prefetching to hide memory latency. It introduces GTChain to enable hardware prefetching during sequential traversals and a set of adaptive strategies in an adaptation layer to auto-tune partitioning, scheduling, and prefetching. The key contributions include CBList, GTChain, interleaved execution with stackless C++20 coroutines, and a family of hybrid prefetching strategies that together deliver substantial gains over state-of-the-art systems across queries, analytics, and updates. Empirical results show GastCoCo achieving up to 180x improvements in updates and up to 41x in graph computation on real-world workloads, demonstrating strong performance portability across diverse graphs and hardware.

Abstract

An efficient data structure is fundamental to meeting the growing demands in dynamic graph processing. However, the dual requirements for graph computation efficiency (with contiguous structures) and graph update efficiency (with linked list-like structures) present a conflict in the design principles of graph structures. After experimental studies of existing state-of-the-art dynamic graph structures, we observe that the overhead of cache misses accounts for a major portion of the graph computation time. This paper presents GastCoCo, a system with graph storage and coroutine-based prefetch co-design. By employing software prefetching via stackless coroutines and introducing a prefetch-friendly data structure CBList, GastCoCo significantly alleviates the performance degradation caused by cache misses. Our results show that GastCoCo outperforms state-of-the-art graph storage systems by 1.3x - 180x in graph updates and 1.4x - 41.1x in graph computation.

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

TL;DR

Abstract

Paper Structure (26 sections, 16 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 16 figures, 3 tables, 3 algorithms.

Introduction
Preliminaries
Graph Operations and Data Access Patterns
Hardware Prefetching in Graph Processing
Software Prefetching via Coroutines
Overview of GastCoCo
Prefetch-Aware Structure CBList
Overview
Update-Read Balanced Edge Storage
Prefecth-Friendly Global Traversal Chain
Interleaved execution with coroutine
Coroutine with Software Prefetching
Load Balancing of Coroutines
Adaptation Layer
Execution Strategy Tuner
...and 11 more sections

Figures (16)

Figure 1: The execution time for graph algorithms (and graph updates) and the CPU cache stall count on different data structures (T.O.: graph updates cannot finish in 24 hours).
Figure 2: Hardware prefetching.
Figure 3: Overview of GastCoCo.
Figure 4: Prefetch-aware structure CBList. (The capacities of small chunks and B+ tree nodes are set as 2 and 3 edges.)
Figure 5: (a) CSR-like physically contiguous store; (b) prefetch-friendly logically contiguous store via pointers; (c) ADJ-like mutually independent store (E.S.: Edge Storage).
...and 11 more figures

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

TL;DR

Abstract

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (16)