RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

Dongwon Choi; Sunwoo Kim; Juyeon Kim; Kyungho Kim; Geon Lee; Shinhwan Kang; Myunghwan Kim; Kijung Shin

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

Dongwon Choi, Sunwoo Kim, Juyeon Kim, Kyungho Kim, Geon Lee, Shinhwan Kang, Myunghwan Kim, Kijung Shin

TL;DR

RDB2G-Bench introduces the first benchmark framework for evaluating automatic graph-modeling strategies that convert relational databases (RDBs) into graphs for downstream predictive tasks. By precomputing around 50,000 graph models across 5 real-world RDBs and 12 tasks, it enables reproducible, rapid evaluation of 10 modeling methods, including heuristic, search-based, and LLM-inspired approaches, with reported speedups of up to 389x versus on-the-fly evaluation. The study reveals that selective inclusion of tables and modeling choices (e.g., Row2Edge vs Row2Node) significantly impacts performance, and that there is no universal modeling rule across tasks. It also shows cross-GNN generalizability of effective graph models, highlights common substructures among top models, and demonstrates the promising potential of LLM-based approaches despite current limitations. The publicly available datasets and code aim to accelerate progress in RDB-to-graph modeling by enabling efficient, fair comparisons and enabling broader applicability across predictive GNNs.

Abstract

Recent advances have demonstrated the effectiveness of graph-based learning on relational databases (RDBs) for predictive tasks. Such approaches require transforming RDBs into graphs, a process we refer to as RDB-to-graph modeling, where rows of tables are represented as nodes and foreign-key relationships as edges. Yet, effective modeling of RDBs into graphs remains challenging. Specifically, there exist numerous ways to model RDBs into graphs, and performance on predictive tasks varies significantly depending on the chosen graph model of RDBs. In our analysis, we find that the best-performing graph model can yield up to a 10% higher performance compared to the common heuristic rule for graph modeling, which remains non-trivial to identify. To foster research on intelligent RDB-to-graph modeling, we introduce RDB2G-Bench, the first benchmark framework for evaluating such methods. We construct extensive datasets covering 5 real-world RDBs and 12 predictive tasks, resulting in around 50k graph model-performance pairs for efficient and reproducible evaluations. Thanks to our precomputed datasets, we were able to benchmark 10 automatic RDB-to-graph modeling methods on the 12 tasks about 380x faster than on-the-fly evaluation, which requires repeated GNN training. Our analysis of the datasets and benchmark results reveals key structural patterns affecting graph model effectiveness, along with practical implications for effective graph modeling. Our datasets and code are available at https://github.com/chlehdwon/RDB2G-Bench.

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

TL;DR

Abstract

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (31)

Theorems & Definitions (1)