A Modular Graph-Native Query Optimization Framework
Bingqing Lyu, Xiaoli Zhou, Longbin Lai, Yufan Yang, Yunkai Lou, Wenyuan Yu, Jingren Zhou
TL;DR
GOpt introduces a modular, graph-native optimization framework for complex graph patterns that unifies multi-language query processing with backend-agnostic optimization. By translating diverse graph queries into a unified Graph Intermediate Representation (GIR) via GraphIrBuilder and enabling backend-specific operators through PhysicalSpec, GOpt harmonizes pattern matching with relational operations. It combines rule-based optimization, automatic type inference, and a cost-based optimization core that leverages high-order statistics (GLogue) and a top-down search with branch-and-bound to produce efficient CGP execution plans. Experimental results on Neo4j and GraphScope show substantial performance gains and scalability, validating the practicality of integrating GOpt into existing graph analytics ecosystems.
Abstract
Complex Graph Patterns (CGPs), which combine pattern matching with relational operations, are widely used in real-world applications. Existing systems rely on monolithic architectures for CGPs, which restrict their ability to integrate multiple query languages and lack certain advanced optimization techniques. Therefore, to address these issues, we introduce GOpt, a modular graph-native query optimization framework with the following features: (1) support for queries in multiple query languages, (2) decoupling execution from specific graph systems, and (3) integration of advanced optimization techniques. Specifically, GOpt offers a high-level interface, GraphIrBuilder, for converting queries from various graph query languages into a unified intermediate representation (GIR), thereby streamlining the optimization process. It also provides a low-level interface, PhysicalSpec, enabling backends to register backend-specific physical operators and cost models. Moreover, GOpt employs a graph-native optimizer that encompasses extensive heuristic rules, an automatic type inference approach, and cost-based optimization techniques tailored for CGPs. Comprehensive experiments show that integrating GOpt significantly boosts performance, with Neo4j achieving an average speedup of 9.2 times (up to 48.6 times), and GraphsScope achieving an average speedup of 33.4 times (up to 78.7 times), on real-world datasets.
