Table of Contents
Fetching ...

Probing How Scalable Table Data Enhances General Long-Context Reasoning

Huaibing Xie, Guoliang Zhao, Yang Liu, Shihan Dou, Siming Huang, Yanling Xiao, Shaolei Wang, Yiting Liu, Cheng Zhang, Shaofan Liu, Pluto Zhou

Abstract

As real-world tasks grow increasingly complex, long-context reasoning has become a core capability for Large Language Models (LLMs). However, few studies explore which data types are effective for long-context reasoning and why. We find that structured table data with periodic structures shows strong potential for long-context reasoning. Motivated by this observation, we mathematically analyze tabular dependency structures using mutual information, revealing periodic non-vanishing dependencies in table data. Furthermore, we systematically analyze the capabilities of structured table data, conduct relevant scaling experiments, and validate its underlying mechanisms for enhancing long-context reasoning, yielding several meaningful insights. Leveraging these insights, we propose a simple yet scalable pipeline(TableLong) for synthesizing high-quality, diverse, and verifiable structured table data to boost long-context reasoning via RL. Extensive experimental results demonstrate that table data significantly enhances the long-context reasoning capability of LLMs across multiple long-context benchmarks (+8.24\% on average), and even improves performance on out-of-domain benchmarks (+8.06\% on average). We hope that our insights provide practical guidance for effective post-training data to enhance long-context reasoning in LLMs.

Probing How Scalable Table Data Enhances General Long-Context Reasoning

Abstract

As real-world tasks grow increasingly complex, long-context reasoning has become a core capability for Large Language Models (LLMs). However, few studies explore which data types are effective for long-context reasoning and why. We find that structured table data with periodic structures shows strong potential for long-context reasoning. Motivated by this observation, we mathematically analyze tabular dependency structures using mutual information, revealing periodic non-vanishing dependencies in table data. Furthermore, we systematically analyze the capabilities of structured table data, conduct relevant scaling experiments, and validate its underlying mechanisms for enhancing long-context reasoning, yielding several meaningful insights. Leveraging these insights, we propose a simple yet scalable pipeline(TableLong) for synthesizing high-quality, diverse, and verifiable structured table data to boost long-context reasoning via RL. Extensive experimental results demonstrate that table data significantly enhances the long-context reasoning capability of LLMs across multiple long-context benchmarks (+8.24\% on average), and even improves performance on out-of-domain benchmarks (+8.06\% on average). We hope that our insights provide practical guidance for effective post-training data to enhance long-context reasoning in LLMs.
Paper Structure (44 sections, 9 theorems, 36 equations, 15 figures, 12 tables)

This paper contains 44 sections, 9 theorems, 36 equations, 15 figures, 12 tables.

Key Result

Theorem 2.2

Under Assumptions (A1)--(A2), for any table with $n$ rows and $m$ columns: That is, $\bar{I}_{table}(d)$ attains periodic peaks of constant height $I^{same}$ at every multiple of the column count $m$.

Figures (15)

  • Figure 1: Overview of TableLong: An end-to-end table data construction pipeline for long-context reasoning.
  • Figure 2: The radar chart of long-context reasoning benchmarks for DS-R1-Distill-32B trained with varying length.
  • Figure 3: Needle in a Haystack retrieval across document depths. Our approach significantly enhances long-context robustness, boosting the 14B model's accuracy from 69.30% to 91.20% and the 32B model's accuracy from 87.95% to 99.40%, achieving near-perfect performance.
  • Figure 4: Decomposition experiments for DS-R1-Distill-32B. (a) While structure alone ("no semantics") boosts baseline performance (+1.67%), semantics remain essential for peak results. Removing delimiters or adding noise yields negligible drops, confirming the primacy of intrinsic structure. (b) Models with "no semantics" suffer premature convergence, whereas "no visible delimiters" settings recover from low initial rewards.
  • Figure 5: Decomposition experiments for DS-R1-Distill-14B. (a) While structure alone ("no semantics") boosts baseline performance (+1.66%), semantics remain essential for peak results. Removing delimiters or adding noise yields negligible drops, confirming the primacy of intrinsic structure. (b) Models with "no semantics" suffer premature convergence, whereas "no visible delimiters" settings recover from low initial rewards.
  • ...and 10 more figures

Theorems & Definitions (28)

  • Definition 2.1: Same-Column Mutual Information
  • Theorem 2.2: Periodic Non-Vanishing Dependency
  • Corollary 2.3: Asymptotic Non-Decay
  • Corollary 2.4: Asymptotic Dominance over Natural Language
  • Definition 2.5: Effective Dependency Distance liu2008dependency
  • Theorem 2.6: Effective Distance Comparison
  • Definition 1.1: Mutual Information cover1999elements
  • Definition 1.2: Kullback-Leibler Divergence
  • Definition 1.3: Mixture Distribution
  • Definition 1.4: Position-to-Column Mapping
  • ...and 18 more