Table of Contents
Fetching ...

Boosting Relational Deep Learning with Pretrained Tabular Models

Veronica Lachi, Antonio Longa, Beatrice Bevilacqua, Bruno Lepri, Andrea Passerini, Bruno Ribeiro

TL;DR

This paper tackles the challenge of making efficient, accurate predictions over temporal-relational data in relational databases. It introduces LightRdl, a hybrid framework that combines a reduced, timestamped relational graph with embeddings distilled from a pretrained tabular model to power a Relational Graph Neural Network (R-GNN). By using a Snapshotted Relational Graph and a tabular-model distillation pipeline, LightRdl achieves substantial speedups (up to $526\times$ inference and $72\times$ training) while maintaining or improving predictive accuracy on RelBench tasks. The approach leverages existing strong tabular features and demonstrates practical impact for real-time relational-database applications, with flexibility to swap in different tabular backends. Overall, LightRdl offers a scalable, efficient alternative to fully end-to-end R-GNNs for temporal-relational prediction in industry-scale databases.

Abstract

Relational databases, organized into tables connected by primary-foreign key relationships, are a common format for organizing data. Making predictions on relational data often involves transforming them into a flat tabular format through table joins and feature engineering, which serve as input to tabular methods. However, designing features that fully capture complex relational patterns remains challenging. Graph Neural Networks (GNNs) offer a compelling alternative by inherently modeling these relationships, but their time overhead during inference limits their applicability for real-time scenarios. In this work, we aim to bridge this gap by leveraging existing feature engineering efforts to enhance the efficiency of GNNs in relational databases. Specifically, we use GNNs to capture complex relationships within relational databases, patterns that are difficult to featurize, while employing engineered features to encode temporal information, thereby avoiding the need to retain the entire historical graph and enabling the use of smaller, more efficient graphs. Our \textsc{LightRDL} approach not only improves efficiency, but also outperforms existing models. Experimental results on the RelBench benchmark demonstrate that our framework achieves up to $33\%$ performance improvement and a $526\times$ inference speedup compared to GNNs, making it highly suitable for real-time inference.

Boosting Relational Deep Learning with Pretrained Tabular Models

TL;DR

This paper tackles the challenge of making efficient, accurate predictions over temporal-relational data in relational databases. It introduces LightRdl, a hybrid framework that combines a reduced, timestamped relational graph with embeddings distilled from a pretrained tabular model to power a Relational Graph Neural Network (R-GNN). By using a Snapshotted Relational Graph and a tabular-model distillation pipeline, LightRdl achieves substantial speedups (up to inference and training) while maintaining or improving predictive accuracy on RelBench tasks. The approach leverages existing strong tabular features and demonstrates practical impact for real-time relational-database applications, with flexibility to swap in different tabular backends. Overall, LightRdl offers a scalable, efficient alternative to fully end-to-end R-GNNs for temporal-relational prediction in industry-scale databases.

Abstract

Relational databases, organized into tables connected by primary-foreign key relationships, are a common format for organizing data. Making predictions on relational data often involves transforming them into a flat tabular format through table joins and feature engineering, which serve as input to tabular methods. However, designing features that fully capture complex relational patterns remains challenging. Graph Neural Networks (GNNs) offer a compelling alternative by inherently modeling these relationships, but their time overhead during inference limits their applicability for real-time scenarios. In this work, we aim to bridge this gap by leveraging existing feature engineering efforts to enhance the efficiency of GNNs in relational databases. Specifically, we use GNNs to capture complex relationships within relational databases, patterns that are difficult to featurize, while employing engineered features to encode temporal information, thereby avoiding the need to retain the entire historical graph and enabling the use of smaller, more efficient graphs. Our \textsc{LightRDL} approach not only improves efficiency, but also outperforms existing models. Experimental results on the RelBench benchmark demonstrate that our framework achieves up to performance improvement and a inference speedup compared to GNNs, making it highly suitable for real-time inference.

Paper Structure

This paper contains 37 sections, 5 equations, 4 figures, 22 tables.

Figures (4)

  • Figure 1: Overview of our proposed hybrid modeling framework LightRdl. The pipeline begins with feature-engineered tabular data processed by a tree-based model (e.g., LightGBM). Knowledge distillation then generates embeddings summarizing temporal information, which are then utilized as additional node features for the static GNN responsible for the predictions.
  • Figure 2: Example of relational graph $G({\mathcal{V}}_{\leq t})$ used in Rdl, where nodes represent users, products, and transactions up to time $t$ (here, "Sunday"). Each transaction is timestamped and linked to the corresponding user and product. The model predicts, for instance, the number of items a user will purchase the next day by aggregating interactions from previous time steps.
  • Figure 3: Example of Snapshotted Relational Graph $G({\mathcal{V}}_{t})$ used in LightRdl, where nodes represent users, products, and transactions that occurred at time $t$ (here, “Sunday”). Unlike in Rdl, this graph includes only interactions from the current day, resulting in a significantly smaller and more efficient structure for training and inference.
  • Figure 4: Comparison of inference and training times for LightRdl and Rdl, across tasks in the Relbench dataset (logarithmic scale); LightRdl achieves speedups in inference, ranging from 28$\times$ to 526$\times$ compared to Rdl. Similarly, LightRdl is up to 72$\times$ faster than Rdl in training. Total inference time for LightRdl is calculated summing up the inference time of the GNN model (red bar) and inference time of the distilled model (yellow bar). Total training time is calculated summing up the training time of the GNN model (red bar), the time required to train the distilled model (yellow bar) and the training time of LightGBM (brown bar).

Theorems & Definitions (3)

  • Definition 1: Relational Database
  • Definition 2: Relational Graph up to time $t$
  • Definition 3: Snapshotted Relational Graph