Table of Contents
Fetching ...

Text2Freq: Learning Series Patterns from Text via Frequency Domain

Ming-Chih Lo, Ching Chang, Wen-Chih Peng

TL;DR

This work proposes Text2Freq, a cross-modality model that integrates text and time series data via the frequency domain, and aligns textual information to the low-frequency components of time series data, establishing more effective and interpretable alignments between these two modalities.

Abstract

Traditional time series forecasting models mainly rely on historical numeric values to predict future outcomes.While these models have shown promising results, they often overlook the rich information available in other modalities, such as textual descriptions of special events, which can provide crucial insights into future dynamics.However, research that jointly incorporates text in time series forecasting remains relatively underexplored compared to other cross-modality work. Additionally, the modality gap between time series data and textual information poses a challenge for multimodal learning. To address this task, we propose Text2Freq, a cross-modality model that integrates text and time series data via the frequency domain. Specifically, our approach aligns textual information to the low-frequency components of time series data, establishing more effective and interpretable alignments between these two modalities. Our experiments on paired datasets of real-world stock prices and synthetic texts show that Text2Freq achieves state-of-the-art performance, with its adaptable architecture encouraging future research in this field.

Text2Freq: Learning Series Patterns from Text via Frequency Domain

TL;DR

This work proposes Text2Freq, a cross-modality model that integrates text and time series data via the frequency domain, and aligns textual information to the low-frequency components of time series data, establishing more effective and interpretable alignments between these two modalities.

Abstract

Traditional time series forecasting models mainly rely on historical numeric values to predict future outcomes.While these models have shown promising results, they often overlook the rich information available in other modalities, such as textual descriptions of special events, which can provide crucial insights into future dynamics.However, research that jointly incorporates text in time series forecasting remains relatively underexplored compared to other cross-modality work. Additionally, the modality gap between time series data and textual information poses a challenge for multimodal learning. To address this task, we propose Text2Freq, a cross-modality model that integrates text and time series data via the frequency domain. Specifically, our approach aligns textual information to the low-frequency components of time series data, establishing more effective and interpretable alignments between these two modalities. Our experiments on paired datasets of real-world stock prices and synthetic texts show that Text2Freq achieves state-of-the-art performance, with its adaptable architecture encouraging future research in this field.

Paper Structure

This paper contains 22 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of alignment issues between time series and text. Starting with only the lowest frequency $N_{LF} = 1$ captures slow-changing patterns like sinusoidal waves but loses details, causing many-to-one mapping issues from text to series. Increasing the frequency components to $N_{LF} = T$ adds detail from a series but introduces noise, leading to one-to-many mapping issues. Since text encapsulates high-level patterns, this work aims to map text to an optimal subset of frequency components $N_{opt}$ that balances noise and information loss.
  • Figure 2: Overview of Text2Freq. Left panel: Stage 1 - Pre-training. Text embeddings are mapped to the latent space of frequency components using a Transformer Encoder. Right panel: Stage 2 - Multimodal Fusion. The pre-trained Transformer is frozen, and the outputs from both modalities are fused using an attention mechanism.
  • Figure 3: The prompt structure for data generation.