Graph Neural Network Approach to Semantic Type Detection in Tables
Ehsan Hoseinzade, Ke Wang
TL;DR
The paper tackles semantic table column type detection under language-model input constraints by introducing GAIT, a framework that stacks a graph neural network on top of a strong single-column predictor (RECA). By representing each table as a graph whose nodes are columns and whose edges capture dependencies, GAIT integrates intra-table dependencies with inter-table context to refine predictions beyond what standalone language models can achieve. Empirical results on Webtables and Semtab show that GAIT, particularly the GAT variant, outperforms existing baselines, with notable gains for low-frequency classes, demonstrating the value of modeling column dependencies. The approach advances practical semantic tagging for data cleaning, schema matching, and data discovery by enabling robust, scalable multi-column predictions without overloading the language model's token budget.
Abstract
This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT
