Table of Contents
Fetching ...

Discourse-Aware In-Context Learning for Temporal Expression Normalization

Akash Kumar Gautam, Lukas Lange, Jannik Strötgen

TL;DR

The paper tackles temporal expression normalization under data scarcity and domain shift by adopting discourse-aware in-context learning with retrieved examples and a document-level context window. It designs a prompt framework and sample-selection strategies for LLMs to perform TE normalization without fine-tuning, evaluated across six domains and seven languages with GPT-3.5-turbo and Zephyr. Results show competitive performance to dedicated normalization models, with notable gains when target documents are distant from training data, especially using Target-centric + Context Window prompts. The work demonstrates the practical potential of zero-/few-shot TE normalization in multilingual, cross-domain IE pipelines, while identifying limitations in long-range discourse and language coverage.

Abstract

Temporal expression (TE) normalization is a well-studied problem. However, the predominately used rule-based systems are highly restricted to specific settings, and upcoming machine learning approaches suffer from a lack of labeled data. In this work, we explore the feasibility of proprietary and open-source large language models (LLMs) for TE normalization using in-context learning to inject task, document, and example information into the model. We explore various sample selection strategies to retrieve the most relevant set of examples. By using a window-based prompt design approach, we can perform TE normalization across sentences, while leveraging the LLM knowledge without training the model. Our experiments show competitive results to models designed for this task. In particular, our method achieves large performance improvements for non-standard settings by dynamically including relevant examples during inference.

Discourse-Aware In-Context Learning for Temporal Expression Normalization

TL;DR

The paper tackles temporal expression normalization under data scarcity and domain shift by adopting discourse-aware in-context learning with retrieved examples and a document-level context window. It designs a prompt framework and sample-selection strategies for LLMs to perform TE normalization without fine-tuning, evaluated across six domains and seven languages with GPT-3.5-turbo and Zephyr. Results show competitive performance to dedicated normalization models, with notable gains when target documents are distant from training data, especially using Target-centric + Context Window prompts. The work demonstrates the practical potential of zero-/few-shot TE normalization in multilingual, cross-domain IE pipelines, while identifying limitations in long-range discourse and language coverage.

Abstract

Temporal expression (TE) normalization is a well-studied problem. However, the predominately used rule-based systems are highly restricted to specific settings, and upcoming machine learning approaches suffer from a lack of labeled data. In this work, we explore the feasibility of proprietary and open-source large language models (LLMs) for TE normalization using in-context learning to inject task, document, and example information into the model. We explore various sample selection strategies to retrieve the most relevant set of examples. By using a window-based prompt design approach, we can perform TE normalization across sentences, while leveraging the LLM knowledge without training the model. Our experiments show competitive results to models designed for this task. In particular, our method achieves large performance improvements for non-standard settings by dynamically including relevant examples during inference.
Paper Structure (13 sections, 6 figures, 4 tables)

This paper contains 13 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of our proposed in-context learning approach for temporal expression normalization. Given a test input, we retrieve similar text representations from the train set. We combine both of them along with a running context window of previous predictions and feed it to a language model along with instructions.
  • Figure 2: Analysis of how the number of examples influences the correctness and failures, e.g., for when the examples exceed the limited context length.
  • Figure 3: Performance on multilingual Ancient-Times corpora with three different sample selection pools.
  • Figure 4: Effect of different context window lengths for our Target-centric + Context Window approach on 3 different corpora.
  • Figure 5: Prompt Example passed to GPT-3.5 for Target-centric + CW (context window) approach. In the guidelines prompt, sentence #1 is the text sequence picked from the train set. Sentence #2 includes text sequences from the test set. Text highlighted in blue is the target sentence passed to the LLM model for normalization. Ones marked in red, are part of the running context window (previous sentences in the same document from the test set, where the VALUE attribute is replaced by predictions from the model.)
  • ...and 1 more figures