TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables
Abhilash Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta
TL;DR
TransientTables targets a core gap in LLM temporal reasoning by evaluating how models reason over temporally evolving, entity-centric infobox tables. The authors construct a large-scale dataset of 3,971 QA pairs drawn from 14,133 tables across 1,238 entities, and they introduce a template-based QA pipeline plus a multi-stage task decomposition to improve grounding and reasoning. Across extensive experiments with multiple models and prompting regimes, they show substantial room for improvement relative to humans, with decomposition, larger context and fine-tuning yielding notable gains. The work demonstrates the limits of current LLMs on temporal, multi-table reasoning and provides a principled framework and benchmarks to push forward temporal reasoning in NLP applications.
Abstract
Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning. To assess the temporal reasoning capabilities of LLMs, we present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods. We introduce a template-based question-generation pipeline that harnesses LLMs to refine both templates and questions. Additionally, we establish baseline results using state-of-the-art LLMs to create a benchmark. We also introduce novel modeling strategies centered around task decomposition, enhancing LLM performance.
