Graph Neural Networks for Databases: A Survey
Ziming Li, Youhuan Li, Yuyu Luo, Guoliang Li, Chuxu Zhang
TL;DR
Provides a taxonomy and comprehensive review of Graph Neural Networks for Databases, distinguishing Relational Databases and Graph Databases as the two primary application domains. It surveys methods spanning performance prediction join order selection cardinality estimation materialized view management and Text to SQL for relational DBs, as well as graph similarity GED subgraph matching and counting for graph DBs, detailing representative models and data representations. The paper discusses design choices including graph construction feature design and integration with DB optimizers, and highlights challenges in scalability and practical deployment. It concludes with future directions such as real world database integration cloud deployment and merging GNNs with large language models to handle natural language interfaces and unstructured data.
Abstract
Graph neural networks (GNNs) are powerful deep learning models for graph-structured data, demonstrating remarkable success across diverse domains. Recently, the database (DB) community has increasingly recognized the potentiality of GNNs, prompting a surge of researches focusing on improving database systems through GNN-based approaches. However, despite notable advances, There is a lack of a comprehensive review and understanding of how GNNs could improve DB systems. Therefore, this survey aims to bridge this gap by providing a structured and in-depth overview of GNNs for DB systems. Specifically, we propose a new taxonomy that classifies existing methods into two key categories: (1) Relational Databases, which includes tasks like performance prediction, query optimization, and text-to-SQL, and (2) Graph Databases, addressing challenges like efficient graph query processing and graph similarity computation. We systematically review key methods in each category, highlighting their contributions and practical implications. Finally, we suggest promising avenues for integrating GNNs into Database systems.
