Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams; Arjun Ashok; Étienne Marcotte; Valentina Zantedeschi; Jithendaraa Subramanian; Roland Riachi; James Requeima; Alexandre Lacoste; Irina Rish; Nicolas Chapados; Alexandre Drouin

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams, Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Jithendaraa Subramanian, Roland Riachi, James Requeima, Alexandre Lacoste, Irina Rish, Nicolas Chapados, Alexandre Drouin

TL;DR

CiK introduces a principled benchmark for forecasting that mandates leveraging natural language context alongside numerical histories. It formalizes context-aided forecasting, proposes a Region of Interest CRPS scoring rule, and evaluates a wide spectrum of models, finding that context-rich LLM prompting often yields strong, data-efficient forecasts albeit with notable robustness and cost tradeoffs. The work demonstrates that contextual information is both crucial and challenging to harness, offering insights and metrics to drive development of accurate, accessible multimodal forecasters. By providing open data, tasks, and evaluation protocols, CiK aims to accelerate research in context-aware, probabilistic time-series forecasting with real-world impact.

Abstract

Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce "Context is Key" (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/.

Context is Key: A Benchmark for Forecasting with Essential Textual Information

TL;DR

Abstract

Context is Key: A Benchmark for Forecasting with Essential Textual Information

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (35)