Table of Contents
Fetching ...

Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data

Kai Kim, Howard Tsai, Rajat Sen, Abhimanyu Das, Zihao Zhou, Abhishek Tanpure, Mathew Luo, Rose Yu

TL;DR

TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting, and the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings are proposed.

Abstract

Current forecasting approaches are largely unimodal and ignore the rich textual data that often accompany the time series due to lack of well-curated multimodal benchmark dataset. In this work, we develop TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting. Our dataset is composed of sequences of numbers and text aligned to timestamps, and includes data from two different domains: climate science and healthcare. Our data is a significant contribution to the rare selection of available multimodal datasets. We also propose the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings. However, contrary to our expectations, our Hybrid-MMF model does not outperform existing baselines in our experiments. This negative result highlights the challenges inherent in multimodal forecasting. Our code and data are available at https://github.com/Rose-STL-Lab/Multimodal_ Forecasting.

Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data

TL;DR

TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting, and the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings are proposed.

Abstract

Current forecasting approaches are largely unimodal and ignore the rich textual data that often accompany the time series due to lack of well-curated multimodal benchmark dataset. In this work, we develop TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting. Our dataset is composed of sequences of numbers and text aligned to timestamps, and includes data from two different domains: climate science and healthcare. Our data is a significant contribution to the rare selection of available multimodal datasets. We also propose the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings. However, contrary to our expectations, our Hybrid-MMF model does not outperform existing baselines in our experiments. This negative result highlights the challenges inherent in multimodal forecasting. Our code and data are available at https://github.com/Rose-STL-Lab/Multimodal_ Forecasting.

Paper Structure

This paper contains 34 sections, 1 equation, 2 figures, 22 tables.

Figures (2)

  • Figure 1: The diagram shows the architecture of the Hybrid Multi-Modal Forecaster (Hybrid-MMF) for integrating time series and text data in two stages. In Stage 1 (Pretraining), the model processes (1) time series data $[I, C]$ and text data $[I, \sim N]$ by embedding and concatenating them [I, E]. A multi-head MLP $[I, C+E] \rightarrow [I, E]$ creates a hidden state that is used to predict time series values $[I \cdot E] \rightarrow [O]$. In Stage 2 (End-to-End Training), token embeddings $[I, N, E]$ are combined with stage 1's hidden state at the beginning of each timestep, and passed through a large language model (LLM). The model predicts both text and time series outputs, with logits $[I + O, N, V]$ producing the final text sequence $[O, N]$.
  • Figure 2: Comparison of 1-1 predictions across models on the weather dataset. The blue line denotes the input time series, and the colored lines correspond to the predictions from each model.