Structural Deep Encoding for Table Question Answering
Raphaël Mouravieff, Benjamin Piwowarski, Sylvain Lamprier
TL;DR
This work addresses the challenge of processing tabular data with Transformers by preserving structural information through sparse attention and absolute encodings. It systematically evaluates existing table encoding methods and introduces novel sparse attention masks and structural modules to improve generalization and scalability. The key contributions include an ANOVA-based analysis of encoding factors, the M1 and M3 sparse masks, and empirical validation on synthetic data and real datasets like WikiSQL and WTQ. The findings demonstrate that sparse attention, combined with absolute positional cues, yields better generalization and substantial computational speedups for large tables, with practical implications for scalable table QA and related tasks.
Abstract
Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential inter-dependencies between rows, columns, and cells, while also posing scalability challenges for large tables. To address these issues, prior works have explored special tokens, structured embeddings, and sparse attention patterns. In this paper, we conduct a comprehensive analysis of tabular encoding techniques, which highlights the crucial role of attention sparsity in preserving structural information of tables. We also introduce a set of novel sparse attention mask designs for tabular data, that not only enhance computational efficiency but also preserve structural integrity, leading to better overall performance.
