Table of Contents
Fetching ...

TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction

Xiawei Guo, Yuhan Quan, Huan Zhao, Quanming Yao, Yong Li, Weiwei Tu

TL;DR

TabGNN introduces a multiplex graph neural network to explicitly model multifaceted sample relations in tabular data prediction, addressing a gap in feature-interaction–only approaches. By constructing directed multiplex graphs from tabular features and using per-layer projections plus inter-layer attention, it learns enhanced sample embeddings that are concatenated with AutoFE representations for final predictions. Across 11 real-world datasets spanning classification and regression, TabGNN consistently improves performance over AutoFE (and DeepFM in some cases), and ablation studies confirm the value of multiple relation types and multiplex architecture. The approach offers a practical plug-in to existing tabular pipelines, with manageable overhead and clear guidance for relation construction in diverse domains.

Abstract

Tabular data prediction (TDP) is one of the most popular industrial applications, and various methods have been designed to improve the prediction performance. However, existing works mainly focus on feature interactions and ignore sample relations, e.g., users with the same education level might have a similar ability to repay the debt. In this work, by explicitly and systematically modeling sample relations, we propose a novel framework TabGNN based on recently popular graph neural networks (GNN). Specifically, we firstly construct a multiplex graph to model the multifaceted sample relations, and then design a multiplex graph neural network to learn enhanced representation for each sample. To integrate TabGNN with the tabular solution in our company, we concatenate the learned embeddings and the original ones, which are then fed to prediction models inside the solution. Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance compared to the tabular solution AutoFE in 4Paradigm.

TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction

TL;DR

TabGNN introduces a multiplex graph neural network to explicitly model multifaceted sample relations in tabular data prediction, addressing a gap in feature-interaction–only approaches. By constructing directed multiplex graphs from tabular features and using per-layer projections plus inter-layer attention, it learns enhanced sample embeddings that are concatenated with AutoFE representations for final predictions. Across 11 real-world datasets spanning classification and regression, TabGNN consistently improves performance over AutoFE (and DeepFM in some cases), and ablation studies confirm the value of multiple relation types and multiplex architecture. The approach offers a practical plug-in to existing tabular pipelines, with manageable overhead and clear guidance for relation construction in diverse domains.

Abstract

Tabular data prediction (TDP) is one of the most popular industrial applications, and various methods have been designed to improve the prediction performance. However, existing works mainly focus on feature interactions and ignore sample relations, e.g., users with the same education level might have a similar ability to repay the debt. In this work, by explicitly and systematically modeling sample relations, we propose a novel framework TabGNN based on recently popular graph neural networks (GNN). Specifically, we firstly construct a multiplex graph to model the multifaceted sample relations, and then design a multiplex graph neural network to learn enhanced representation for each sample. To integrate TabGNN with the tabular solution in our company, we concatenate the learned embeddings and the original ones, which are then fed to prediction models inside the solution. Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance compared to the tabular solution AutoFE in 4Paradigm.

Paper Structure

This paper contains 21 sections, 4 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example tabular data from real-world financial loan activities, where each row represents a sample, and columns are the related features. The values in the column Overdue mean whether the user will repay the debt in time. In practice, we need to train a model to predict the values (label) of the column Overdue given the information (features) in other columns. Note that the values for Education and City are encoded as numbers.
  • Figure 2: An illustrative example for learning the representations of the sample (user $35360$) by TabGNN in Figure \ref{['fig-tabular-example']} (Best viewed in color). Firstly, a directed multiplex graph is constructed based on a numerical feature $Age$ and a categorical feature $Education$. The features of all samples are encoded to be fed to the multiplex graph neural network to obtain a final representation for the sample (user $35360$). A Multilayer Perceptron (MLP) layer is used to generate the final prediction label, which is used to compute the loss.
  • Figure 3: The system overview of AutoFE. We further add TabGNN to show the integration process.
  • Figure 4: Attention scores of different relations by TabGNN.
  • Figure 5: An illustrative example of two users ($uid=0$ and $uid=1$) from Data3, which is a loan scenario. Age and City are used to construct the multiplex graph. Circles in dark are positive samples, i.e., defaulted users. We can see that the intra-layer attention scores of defaulted neighbors are larger.

Theorems & Definitions (1)

  • Definition 1