Table of Contents
Fetching ...

Grables: Tabular Learning Beyond Independent Rows

Tamara Cucumides, Floris Geerts

TL;DR

Grables formalize when tabular learning that operates on independent rows is insufficient by separating how a table is lifted into a graph from how predictions are made on that graph. The core insight is that row-local predictors cannot capture extension-sensitive targets driven by inter-row counts, overlaps, or relational patterns, whereas explicit inter-row structures with message passing can. Through controlled experiments on synthetic data, retail transactions, and RelBench, the paper demonstrates when structure helps and how hybrid approaches that combine explicit inter-row structure with strong tabular learners yield robust gains. The Grable framework thus provides a principled, interpretable lens to diagnose and leverage relational information in tabular data, highlighting the complementarity of tabular and graph-based representations.

Abstract

Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where labels depend on other rows. We show that row-wise prediction rules out natural targets driven by global counts, overlaps, and relational patterns. To make "using structure" precise across architectures, we introduce grables: a modular interface that separates how a table is lifted to a graph (constructor) from how predictions are computed on that graph (node predictor), pinpointing where expressive power comes from. Experiments on synthetic tasks, transaction data, and a RelBench clinical-trials dataset confirm the predicted separations: message passing captures inter-row dependencies that row-local models miss, and hybrid approaches that explicitly extract inter-row structure and feed it to strong tabular learners yield consistent gains.

Grables: Tabular Learning Beyond Independent Rows

TL;DR

Grables formalize when tabular learning that operates on independent rows is insufficient by separating how a table is lifted into a graph from how predictions are made on that graph. The core insight is that row-local predictors cannot capture extension-sensitive targets driven by inter-row counts, overlaps, or relational patterns, whereas explicit inter-row structures with message passing can. Through controlled experiments on synthetic data, retail transactions, and RelBench, the paper demonstrates when structure helps and how hybrid approaches that combine explicit inter-row structure with strong tabular learners yield robust gains. The Grable framework thus provides a principled, interpretable lens to diagnose and leverage relational information in tabular data, highlighting the complementarity of tabular and graph-based representations.

Abstract

Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where labels depend on other rows. We show that row-wise prediction rules out natural targets driven by global counts, overlaps, and relational patterns. To make "using structure" precise across architectures, we introduce grables: a modular interface that separates how a table is lifted to a graph (constructor) from how predictions are computed on that graph (node predictor), pinpointing where expressive power comes from. Experiments on synthetic tasks, transaction data, and a RelBench clinical-trials dataset confirm the predicted separations: message passing captures inter-row dependencies that row-local models miss, and hybrid approaches that explicitly extract inter-row structure and feed it to strong tabular learners yield consistent gains.
Paper Structure (104 sections, 5 theorems, 52 equations, 10 figures, 11 tables)

This paper contains 104 sections, 5 theorems, 52 equations, 10 figures, 11 tables.

Key Result

Theorem 1.1

Fix $k\in\mathbb N$ and a unary FO formula $\varphi(x)$ over a graph schema $\sigma_G$ representing graphs. The following are equivalent:

Figures (10)

  • Figure 1: Incidence-grable patterns for our four tasks. Row nodes (circles) connect to column--value nodes (squares) via typed edges. (a) Unique: a column--value node adjacent to a single row node. (b) Count: the degree of a shared column--value node. (c) Double: a length-3 pattern $v_r\!-\!u_{i,v}\!-\!v_s\!-\!u_{j,w}$ with a local constraint on $v_s$. (d) Diamond: two shared column--value nodes witnessing that the same row $v_s$ overlaps twice with $v_r$.
  • Figure 2: F1-score in validation, test and stress data of RealMLP in Unique, and LightGBM in Double and Diamond tasks.
  • Figure 3: UpSet plot showing the overlap and disagreement predictions between model predictions in the test set.
  • Figure 4: Relational schema of the relbench-trial relational dataset. The task studied is associated with table studies that contain descriptive information about the clinical studies. Taken from https://relbench.stanford.edu/datasets/rel-trial/
  • Figure 5: Box-plots for F1 score of models LightGBM, RealMLP, Sage and TabPFN in 5 different perturbations of the original test set.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Theorem 1.1: Bar+2020
  • Example 2.1: Trivial grable
  • Example 2.2: Incidence grable
  • Definition 2.3: Grabular expressibility
  • Proposition 3.1: MPNNs on $\gamma_{\mathrm{triv}}$ are row-wise
  • Definition 3.2: Row-locality / extension invariance
  • Proposition 3.3: Incidence separates row-locality
  • Proposition 3.4
  • Proposition 3.5: Diamond is beyond MPNNs on incidence grable
  • Example 3.1: CARTE grable
  • ...and 5 more