Table of Contents
Fetching ...

Enhancing Temporal Understanding in LLMs for Semi-structured Tables

Irwin Deng, Kushagra Dixit, Vivek Gupta, Dan Roth

TL;DR

The paper addresses the challenge of temporal reasoning over tabular data in LLMs by analyzing TempTabQA, refining evaluation with a cleaner dataset, and proposing C.L.E.A.R prompting to ground reasoning in evidence. It further demonstrates that indirect supervision via auxiliary data, particularly the TRAM dataset, yields cross-domain improvements in temporal understanding. The combined approach—C.L.E.A.R prompting plus TRAM-based fine-tuning—produces notable gains across multiple models, advancing practical capabilities for temporal question answering on semi-structured data. These contributions offer scalable pathways to robust temporal reasoning in real-world applications and identify directions for future enhancements, including synthetic data and neuro-symbolic integration.

Abstract

Temporal reasoning over tabular data presents substantial challenges for large language models (LLMs), as evidenced by recent research. In this study, we conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of LLMs. Our investigation leads to enhancements in TempTabQA, a dataset specifically designed for tabular temporal question answering. We provide critical insights for improving LLM performance in temporal reasoning tasks with tabular data. Furthermore, we introduce a novel approach, C.L.E.A.R to strengthen LLM capabilities in this domain. Our findings demonstrate that our method significantly improves evidence-based reasoning across various models. Additionally, our experimental results reveal that indirect supervision with auxiliary data substantially boosts model performance in these tasks. This work contributes to a deeper understanding of LLMs' temporal reasoning abilities over tabular data and promotes advancements in their application across diverse fields.

Enhancing Temporal Understanding in LLMs for Semi-structured Tables

TL;DR

The paper addresses the challenge of temporal reasoning over tabular data in LLMs by analyzing TempTabQA, refining evaluation with a cleaner dataset, and proposing C.L.E.A.R prompting to ground reasoning in evidence. It further demonstrates that indirect supervision via auxiliary data, particularly the TRAM dataset, yields cross-domain improvements in temporal understanding. The combined approach—C.L.E.A.R prompting plus TRAM-based fine-tuning—produces notable gains across multiple models, advancing practical capabilities for temporal question answering on semi-structured data. These contributions offer scalable pathways to robust temporal reasoning in real-world applications and identify directions for future enhancements, including synthetic data and neuro-symbolic integration.

Abstract

Temporal reasoning over tabular data presents substantial challenges for large language models (LLMs), as evidenced by recent research. In this study, we conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of LLMs. Our investigation leads to enhancements in TempTabQA, a dataset specifically designed for tabular temporal question answering. We provide critical insights for improving LLM performance in temporal reasoning tasks with tabular data. Furthermore, we introduce a novel approach, C.L.E.A.R to strengthen LLM capabilities in this domain. Our findings demonstrate that our method significantly improves evidence-based reasoning across various models. Additionally, our experimental results reveal that indirect supervision with auxiliary data substantially boosts model performance in these tasks. This work contributes to a deeper understanding of LLMs' temporal reasoning abilities over tabular data and promotes advancements in their application across diverse fields.
Paper Structure (20 sections, 7 figures, 6 tables)

This paper contains 20 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: A semi-structured table of Baseball pitcher Al McBean with follow up question answers.
  • Figure 2: Prompt Example
  • Figure 3: The figure illustrates the step-by-step process of C.L.E.A.R instruction. The reference table is provided in Figure \ref{['fig:wiki_table']}
  • Figure 4: Example 1 from TempTabQA head set with C.L.E.A.R prompting on GPT 3.5 turbo : Input
  • Figure 5: Example 1 from TempTabQA head set with C.L.E.A.R prompting on GPT 3.5 turbo : Response
  • ...and 2 more figures