Overcoming the Modality Gap in Context-Aided Forecasting

Vincent Zhihao Zheng; Étienne Marcotte; Arjun Ashok; Andrew Robert Williams; Lijun Sun; Alexandre Drouin; Valentina Zantedeschi

Overcoming the Modality Gap in Context-Aided Forecasting

Vincent Zhihao Zheng, Étienne Marcotte, Arjun Ashok, Andrew Robert Williams, Lijun Sun, Alexandre Drouin, Valentina Zantedeschi

Abstract

Context-aided forecasting (CAF) holds promise for integrating domain knowledge and forward-looking information, enabling AI systems to surpass traditional statistical methods. However, recent empirical studies reveal a puzzling gap: multimodal models often fail to outperform their unimodal counterparts. We hypothesize that this underperformance stems from poor context quality in existing datasets, as verification is challenging. To address these limitations, we introduce a semi-synthetic data augmentation method that generates contexts both descriptive of temporal dynamics and verifiably complementary to numerical histories. This approach enables massive-scale dataset creation, resulting in CAF-7M, a corpus of 7 million context-augmented time series windows, including a rigorously verified test set. We demonstrate that semi-synthetic pre-training transfers effectively to real-world evaluation, and show clear evidence of context utilization. Our results suggest that dataset quality, rather than architectural limitations, has been the primary bottleneck in context-aided forecasting.

Overcoming the Modality Gap in Context-Aided Forecasting

Abstract

Paper Structure (76 sections, 16 equations, 56 figures, 4 tables)

This paper contains 76 sections, 16 equations, 56 figures, 4 tables.

Introduction
Background and Related Work
Problem Setting
Multimodal Time Series Datasets
Models for Context-Aided Forecasting
Proposed Method
Data Augmenting Time Series Datasets
Generating Plausible Contexts
Ensuring Informative Contexts
CAF-7M: A Context-Aided Forecasting Dataset
Training a Context-Aided Forecasting Model
Aligning Context and Time Series
DoubleCast
Experiments
Context Complementarity in the CAF-7M test set
...and 61 more sections

Figures (56)

Figure 1: The data-augmentation pipeline: (1) From each source dataset, we sample forecasting windows consisting of a numerical history and prediction horizon, along with dataset metadata. (2) We generate scenario-style textual context conditioned on the window. (3) We verify context relevance by checking whether a strong CAF method achieves better predictions with context than without (e.g., lower CRPS); windows are accepted if context improves predictions and rejected otherwise.
Figure 2: Architecture of DoubleCast. Each DualT5 decoder block consists of, in sequence: masked self‐attention; Chronos encoder--decoder cross‐attention; DualT5 cross‐attention; and a FFN layer. Each sublayer is wrapped by a residual connection and layer normalization (LN). The same $\boldsymbol{e}_{\mathrm{ts}}$ and $\boldsymbol{e}_{\mathrm{ctx}}$ are provided to every decoder block.
Figure 3: CRPS.
Figure 4: Win Rate.
Figure 6: Normalized MASE (left, $\downarrow$) and CRPS (right, $\downarrow$) on GIFT-Eval aksugift. Despite extending Chronos for context-aided forecasting, DoubleCast retains Chronos' forecasting capabilities: DoubleCast performs nearly identically to Chronos on no-context time series forecasting, as compared to both AutoArima and the incumbent state-of-the-art pre-trained model (Migas-1.0).
...and 51 more figures

Overcoming the Modality Gap in Context-Aided Forecasting

Abstract

Overcoming the Modality Gap in Context-Aided Forecasting

Authors

Abstract

Table of Contents

Figures (56)