A Graph-native Optimization Framework for Complex Graph Queries
Bingqing Lyu, Xiaoli Zhou, Longbin Lai, Yufan Yang, Yunkai Lou, Wenyuan Yu, Jingren Zhou
TL;DR
GOpt addresses the challenge of efficiently processing complex graph patterns by introducing a graph-native, language-agnostic intermediate representation (GIR) and a dual-strategy optimization engine: RuleBasedStrategy for heuristic, rule-driven improvements and PatternStrategy for cost-based, internal ordering of graph operators. Its architecture decouples parsing, optimization, and execution, enabling seamless backend integration with Neo4j and GraphScope through a physical-converter interface and an optional protobuf-based plan format. The framework formalizes semantic variations in pattern matching, supports extensive optimization rules (e.g., FilterIntoPattern, EVFusion, ComSubPattern), and demonstrates substantial performance gains across a spectrum of queries, including single-pattern, multi-pattern, path, cyclic, and complex CGPs on SNB-like workloads. The work provides practical guidance for industrial graph analytics pipelines, showing how to apply hierarchical optimizations and cost-aware plan selection to achieve scalable, graph-native query processing.
Abstract
This technical report extends the SIGMOD 2025 paper "A Modular Graph-Native Query Optimization Framework" by providing a comprehensive exposition of GOpt's advanced technical mechanisms, implementation strategies, and extended evaluations. While the original paper introduced GOpt's unified intermediate representation (GIR) and demonstrated its performance benefits, this report delves into the framework's implementation depth: (1) the full specification of GOpt's optimization rules; (2) a systematic treatment of semantic variations (e.g., homomorphism vs. edge-distinct matching) across query languages and their implications for optimization; (3) the design of GOpt's Physical integration interface, enabling seamless integration with transactional (Neo4j) and distributed (GraphScope) backends via engine-specific operator customization; and (4) a detailed analysis of plan transformations for LDBC benchmark queries.
