Table of Contents
Fetching ...

Random-Forest-Induced Graph Neural Networks for Tabular Learning

Haozhe Chen, Soheila Farokhi, Kelvyn Bladen, Hamid Karimi, Kevin R. Moon

TL;DR

RF-GNN, a framework that constructs instance-level graphs from tabular data using proximity measures induced by random forests, consistently outperforms strong classical baselines and recent graph-construction methods in terms of weighted F1-score.

Abstract

Graphs are essential for modeling complex relationships and capturing structured interactions in data. Graph Neural Networks (GNNs) are particularly effective when such relational structure is explicitly available, but many real-world datasets, most notably tabular data, lack an inherent graph representation. To address this limitation, we propose RF-GNN, a framework that constructs instance-level graphs from tabular data using proximity measures induced by random forests. These proximities capture nonlinear feature interactions and data-adaptive similarity without imposing restrictive assumptions on feature geometry. The resulting graphs enable the direct application of GNNs to tabular learning problems. Extensive experiments on 36 benchmark datasets demonstrate that RF-GNN consistently outperforms strong classical baselines and recent graph-construction methods in terms of weighted F1-score. Additional ablation studies highlight the impact of proximity design choices and graph construction settings.

Random-Forest-Induced Graph Neural Networks for Tabular Learning

TL;DR

RF-GNN, a framework that constructs instance-level graphs from tabular data using proximity measures induced by random forests, consistently outperforms strong classical baselines and recent graph-construction methods in terms of weighted F1-score.

Abstract

Graphs are essential for modeling complex relationships and capturing structured interactions in data. Graph Neural Networks (GNNs) are particularly effective when such relational structure is explicitly available, but many real-world datasets, most notably tabular data, lack an inherent graph representation. To address this limitation, we propose RF-GNN, a framework that constructs instance-level graphs from tabular data using proximity measures induced by random forests. These proximities capture nonlinear feature interactions and data-adaptive similarity without imposing restrictive assumptions on feature geometry. The resulting graphs enable the direct application of GNNs to tabular learning problems. Extensive experiments on 36 benchmark datasets demonstrate that RF-GNN consistently outperforms strong classical baselines and recent graph-construction methods in terms of weighted F1-score. Additional ablation studies highlight the impact of proximity design choices and graph construction settings.
Paper Structure (26 sections, 12 equations, 5 figures, 6 tables)

This paper contains 26 sections, 12 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An overview of the proposed method (RF-GNN). A random forest is first trained on the tabular data. Pairwise proximities are extracted from the random forest and then converted to an adjacency matrix, which is used as input to a GNN.
  • Figure 2: The workflow of our proposed method (RF-GNN), which uses a random forest to learn a graph from tabular data. The graph structure is then used as input to a GNN for final prediction.
  • Figure 3: Effect of using different proximity measures on model performance on 5 different datasets in terms of weighted F1-score. The RF proximity gives the best performance.
  • Figure 4: Sensitivity analysis of RF-GNN performance across varying proximity thresholds $\alpha$ for five datasets (902, 941, 6, 182, and 23). The weighted F1-score is reported with error bars over repeated runs. Across most datasets, performance remains stable for threshold values in the range $\alpha \in [0.2, 0.4]$. Dataset 941 exhibits relatively higher variability, indicating greater sensitivity to threshold selection. Overall, the results demonstrate that RF-GNN is robust to the edge-threshold hyperparameter within a moderate range.
  • Figure 5: Distribution of optimal proximity thresholds $\alpha$ across 36 datasets, highlighting a concentration between 0.1 and 0.5. This range suggests that moderate graph density is optimal for GNN performance.