Table of Contents
Fetching ...

2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models

Jia-Nan Li, Jian Guan, Wei Wu, Zhengtao Yu, Rui Yan

TL;DR

2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure, and is introduced as a simple yet effective positional encoding method.

Abstract

Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only support one-dimensional~(1D) inputs, existing methods often flatten the two-dimensional~(2D) table structure into a sequence of tokens, which can severely disrupt the spatial relationships and result in an inevitable loss of vital contextual information. In this paper, we first empirically demonstrate the detrimental impact of such flattening operations on the performance of LLMs in capturing the spatial information of tables through two elaborate proxy tasks. Subsequently, we introduce a simple yet effective positional encoding method, termed ``2D-TPE'' (Two-Dimensional Table Positional Encoding), to address this challenge. 2D-TPE enables each attention head to dynamically select a permutation order of tokens within the context for attending to them, where each permutation represents a distinct traversal mode for the table, such as column-wise or row-wise traversal. 2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure. Extensive experiments across five benchmarks demonstrate that 2D-TPE outperforms strong baselines, underscoring the importance of preserving the table structure for accurate table comprehension. Comprehensive analysis further reveals the substantially better scalability of 2D-TPE to large tables than baselines.

2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models

TL;DR

2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure, and is introduced as a simple yet effective positional encoding method.

Abstract

Tables are ubiquitous across various domains for concisely representing structured information. Empowering large language models (LLMs) to reason over tabular data represents an actively explored direction. However, since typical LLMs only support one-dimensional~(1D) inputs, existing methods often flatten the two-dimensional~(2D) table structure into a sequence of tokens, which can severely disrupt the spatial relationships and result in an inevitable loss of vital contextual information. In this paper, we first empirically demonstrate the detrimental impact of such flattening operations on the performance of LLMs in capturing the spatial information of tables through two elaborate proxy tasks. Subsequently, we introduce a simple yet effective positional encoding method, termed ``2D-TPE'' (Two-Dimensional Table Positional Encoding), to address this challenge. 2D-TPE enables each attention head to dynamically select a permutation order of tokens within the context for attending to them, where each permutation represents a distinct traversal mode for the table, such as column-wise or row-wise traversal. 2D-TPE effectively mitigates the risk of losing essential spatial information while preserving computational efficiency, thus better preserving the table structure. Extensive experiments across five benchmarks demonstrate that 2D-TPE outperforms strong baselines, underscoring the importance of preserving the table structure for accurate table comprehension. Comprehensive analysis further reveals the substantially better scalability of 2D-TPE to large tables than baselines.
Paper Structure (49 sections, 11 equations, 6 figures, 11 tables)

This paper contains 49 sections, 11 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Illustration for the proposed two proxy tasks.
  • Figure 2: Overview of 2D-TPE. $x_m$: the $m$-th token in the sequence; $p_{m,1}$/$p_{m,2}$: the position index for the token $x_m$ using row/column-wise traversal. The indices in the same color mean that their corresponding tokens are in the same row/column when using $p_{m,1}$/$p_{m,2}$, respectively.
  • Figure 3: Performance advantages ($\Delta$) of 2D-TPE over three representative baselines varying with the number of rows or columns. The thresholds for stratifying tables are determined to ensure a balanced distribution of data volumes.
  • Figure 4: Table expansion for size scaling.
  • Figure 5: Impact of the hyper-parameter $\lambda$. Specifically, we plot the change in ACC for datasets HiTab and EntLink, BLEU-4 for FeTaQA, and F1 for RelExtra and ColType as $\lambda$ varies.
  • ...and 1 more figures