Table of Contents
Fetching ...

From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic Space

Lehui Li, Yuyao Wang, Jisheng Yan, Wei Zhang, Jinliang Deng, Haoliang Sun, Zhongyi Han, Yongshun Gong

Abstract

Incorporating textual information into time-series forecasting holds promise for addressing event-driven non-stationarity; however, a fundamental modality gap hinders effective fusion: textual descriptions express temporal impacts implicitly and qualitatively, whereas forecasting models rely on explicit and quantitative signals. Through controlled semi-synthetic experiments, we show that existing methods over-attend to redundant tokens and struggle to reliably translate textual semantics into usable numerical cues. To bridge this gap, we propose TESS, which introduces a Temporal Evolution Semantic Space as an intermediate bottleneck between modalities. This space consists of interpretable, numerically grounded temporal primitives (mean shift, volatility, shape, and lag) extracted from text by an LLM via structured prompting and filtered through confidence-aware gating. Experiments on four real-world datasets demonstrate up to a 29 percent reduction in forecasting error compared to state-of-the-art unimodal and multimodal baselines. The code will be released after acceptance.

From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic Space

Abstract

Incorporating textual information into time-series forecasting holds promise for addressing event-driven non-stationarity; however, a fundamental modality gap hinders effective fusion: textual descriptions express temporal impacts implicitly and qualitatively, whereas forecasting models rely on explicit and quantitative signals. Through controlled semi-synthetic experiments, we show that existing methods over-attend to redundant tokens and struggle to reliably translate textual semantics into usable numerical cues. To bridge this gap, we propose TESS, which introduces a Temporal Evolution Semantic Space as an intermediate bottleneck between modalities. This space consists of interpretable, numerically grounded temporal primitives (mean shift, volatility, shape, and lag) extracted from text by an LLM via structured prompting and filtered through confidence-aware gating. Experiments on four real-world datasets demonstrate up to a 29 percent reduction in forecasting error compared to state-of-the-art unimodal and multimodal baselines. The code will be released after acceptance.
Paper Structure (36 sections, 4 theorems, 49 equations, 8 figures, 3 tables)

This paper contains 36 sections, 4 theorems, 49 equations, 8 figures, 3 tables.

Key Result

Theorem 4.1

Assume semantic sufficiency (Assump. assump:semantic-suff) in our forecasting setting: $\hat{\mathbf{Y}}_t \perp\!\!\!\perp \mathbf{X}_{\text{text}} \mid (\mathbf{P}_t,\mathbf{X}_{\text{time}})$, where $\mathbf{P}_t$ denotes the distilled primitives. Then $\forall$ encoders $f:\mathbf{X}_{\text{text Moreover, under standard sub-Gaussian loss and i.i.d. sampling Assumption ass:subgaussian, and as

Figures (8)

  • Figure 1: Illustration of the cross-modal transformation via the Temporal Semantic Space. Verbose textual narratives (left) are distilled into structured temporal primitives (middle), which then guide numerical forecasting (right).
  • Figure 2: Analysis of attention misalignment. Left: Distribution of focus ratio $R_t$ on test samples. Right: Relationship between redundant token count and predictive performance.
  • Figure 3: Comparison of three input variants. Left: Prediction performance (MSE) across Full, Signal-Only, and Numerical inputs. Right: Training loss curves showing convergence dynamics.
  • Figure 4: Overview of TESS. Given numerical observations and associated text, a frozen LLM extracts temporal evolution primitives (e.g., mean shift, shape, volatility, lag) via structured prompting. These primitives, after confidence-aware gating, condition a Transformer-based forecaster that fuses semantic signals with encoded historical sequences to produce numerical predictions.
  • Figure 5: Performance comparison on three types of non-stationary scenarios (Shape, Volatility, Mean Shift). TESS (blue) consistently outperforms both unimodal (teal) and multimodal (red) baselines.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Theorem 4.1
  • Theorem : Restatement of Theorem \ref{['thm:main']}
  • proof
  • Theorem 1.5
  • proof
  • Theorem 1.6
  • proof
  • Remark 1.7