Graph Neural Networks for Tabular Data Learning: A Survey with Taxonomy and Directions
Cheng-Te Li, Yu-Che Tsai, Chih-Yao Chen, Jay Chiehen Liao
TL;DR
This survey systematically reviews Graph Neural Networks for Tabular Data Learning (GNN4TDL), arguing that modeling latent instance and feature correlations via graphs can surpass traditional tabular methods. It introduces a four-axis taxonomy—graph formulation, graph construction, representation learning, and training plans—and maps a wide range of methods (homogeneous, heterogeneous, and hypergraphs) to these axes. The paper covers practical graph-construction strategies, specialized GNN designs for tabular data, and training regimens, illustrating applications in fraud detection, CTR prediction, medical prediction, missing data imputation, FinTech, and relational databases. It also discusses limitations and future directions, including graph-transformer hybrids, scalability, SSL, robustness, and non-homogeneous graph learning, to spur further research in GNN-enabled tabular data analysis.
Abstract
In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and feature values. GNNs, with their innate capability to model intricate relationships and interactions between diverse elements of tabular data, have garnered significant interest and application across various TDL domains. Our survey provides a systematic review of the methods involved in designing and implementing GNNs for TDL (GNN4TDL). It encompasses a detailed investigation into the foundational aspects and an overview of GNN-based TDL methods, offering insights into their evolving landscape. We present a comprehensive taxonomy focused on constructing graph structures and representation learning within GNN-based TDL methods. In addition, the survey examines various training plans, emphasizing the integration of auxiliary tasks to enhance the effectiveness of instance representations. A critical part of our discussion is dedicated to the practical application of GNNs across a spectrum of GNN4TDL scenarios, demonstrating their versatility and impact. Lastly, we discuss the limitations and propose future research directions, aiming to spur advancements in GNN4TDL. This survey serves as a resource for researchers and practitioners, offering a thorough understanding of GNNs' role in revolutionizing TDL and pointing towards future innovations in this promising area.
