Table of Contents
Fetching ...

Graph-based Tabular Deep Learning Should Learn Feature Interactions, Not Just Make Predictions

Elias Dubbeldam, Reza Mohammadi, Marit Schoonhoven, S. Ilker Birbil

TL;DR

The paper addresses the gap in GTDL for tabular data by arguing that learning and validating the explicit feature-interaction graph $G=(V,E)$ with weighted adjacency $A \in \mathbb{R}^{p \times p}$ (where $A_{ii}=0$, and $0\le A_{ij}\le1$) is essential, not just predicting targets. It proposes synthetic benchmarks with ground-truth graphs and a quantitative $AUC$-$ROC$ metric to evaluate structure recovery, showing that many GTDL methods fail to recover meaningful interactions (ROC-AUC near 0.5) while explicit baselines like BDgraph can recover the structure well. Importantly, pruning models to the true edges often improves predictive $R^2$, especially with limited data, underscoring the practical value of structure-aware learning. The authors call for a new generation of GTDL models that incorporate structure-aware inductive biases, leverage ground-truth benchmarks, and extend to richer modalities and categorical features, aiming for interpretable and trustworthy tabular deep learning.

Abstract

Despite recent progress, deep learning methods for tabular data still struggle to compete with traditional tree-based models. A key challenge lies in modeling complex, dataset-specific feature interactions that are central to tabular data. Graph-based tabular deep learning (GTDL) methods aim to address this by representing features and their interactions as graphs. However, existing methods predominantly optimize predictive accuracy, neglecting accurate modeling of the graph structure. This position paper argues that GTDL should move beyond prediction-centric objectives and prioritize the explicit learning and evaluation of feature interactions. Using synthetic datasets with known ground-truth graph structures, we show that existing GTDL methods fail to recover meaningful feature interactions. Moreover, enforcing the true interaction structure improves predictive performance. This highlights the need for GTDL methods to prioritize quantitative evaluation and accurate structural learning. We call for a shift toward structure-aware modeling as a foundation for building GTDL systems that are not only accurate but also interpretable, trustworthy, and grounded in domain understanding.

Graph-based Tabular Deep Learning Should Learn Feature Interactions, Not Just Make Predictions

TL;DR

The paper addresses the gap in GTDL for tabular data by arguing that learning and validating the explicit feature-interaction graph with weighted adjacency (where , and ) is essential, not just predicting targets. It proposes synthetic benchmarks with ground-truth graphs and a quantitative - metric to evaluate structure recovery, showing that many GTDL methods fail to recover meaningful interactions (ROC-AUC near 0.5) while explicit baselines like BDgraph can recover the structure well. Importantly, pruning models to the true edges often improves predictive , especially with limited data, underscoring the practical value of structure-aware learning. The authors call for a new generation of GTDL models that incorporate structure-aware inductive biases, leverage ground-truth benchmarks, and extend to richer modalities and categorical features, aiming for interpretable and trustworthy tabular deep learning.

Abstract

Despite recent progress, deep learning methods for tabular data still struggle to compete with traditional tree-based models. A key challenge lies in modeling complex, dataset-specific feature interactions that are central to tabular data. Graph-based tabular deep learning (GTDL) methods aim to address this by representing features and their interactions as graphs. However, existing methods predominantly optimize predictive accuracy, neglecting accurate modeling of the graph structure. This position paper argues that GTDL should move beyond prediction-centric objectives and prioritize the explicit learning and evaluation of feature interactions. Using synthetic datasets with known ground-truth graph structures, we show that existing GTDL methods fail to recover meaningful feature interactions. Moreover, enforcing the true interaction structure improves predictive performance. This highlights the need for GTDL methods to prioritize quantitative evaluation and accurate structural learning. We call for a shift toward structure-aware modeling as a foundation for building GTDL systems that are not only accurate but also interpretable, trustworthy, and grounded in domain understanding.

Paper Structure

This paper contains 34 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The true underlying graph structure generates tabular data. After training existing methods to predict the target feature, the extracted learned graph structure is not similar to the true graph structure. The predictive performance of methods improves when the extracted graph structure is accurate.
  • Figure 2: Two synthetic data generation pipelines. Both pipelines can roughly be divided into three steps. (i) Sample a graph structure; (ii) Sample feature interactions; (iii) Sample data given the graph and feature interactions. Nodes are colored with as • cyan input features $x$, • orange target feature $y$, and • green root nodes $x_\mathrm{roots}$.
  • Figure 3: Top: methods use a fully connected graph. The learned graph structure should be evaluated by comparing it to the true graph structure. Bottom: To understand the effect when the model can only learn the true feature interactions, we prune the graph to the true edges.
  • Figure 4: Graph quality in the form of the comparing the learned weighted adjacency matrix with the true binary one, for two different dataset types. Results are averaged over seeds, cross validations, and three datasets. All models have $\text{\xspace} \approx 0.5$, which is random chance, indicating that they are not able to learn the feature interactions in any meaningful way. The statistical method BDgraph can learn the feature interactions.
  • Figure 5: Predictive performance while varying the number of training samples $n_\text{train}$. Results are averaged over seeds, cross validations, and three datasets. When the graph is pruned to its true edges, the predictive performance is, in most cases, better compared to the fully connected graph. This difference reduces as the number of training samples increases. Note the different scale for the y-axis.
  • ...and 6 more figures