Table of Contents
Fetching ...

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

Geon Lee, Wenchao Yu, Kijung Shin, Wei Cheng, Haifeng Chen

TL;DR

The paper tackles the challenge of predicting time-series events when contextual information is essential. It introduces TimeCP, a dual-LLM framework that contextualizes time-series data before prediction, and TimeCAP, which adds a trainable multi-modal encoder to jointly leverage textual summaries and time-series inputs, retrieving in-context examples to augment prompts and fuse predictions. Across seven real-world datasets in weather, finance, and healthcare, TimeCAP achieves substantial gains—up to an average improvement of 28.75% in F1 score over state-of-the-art baselines—and provides interpretable rationales for predictions. By enabling LMaaS compatibility and offering data and summaries for future research, TimeCAP advances context-aware time-series forecasting with practical implications for climate, economics, and public health analytics.

Abstract

Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.

TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents

TL;DR

The paper tackles the challenge of predicting time-series events when contextual information is essential. It introduces TimeCP, a dual-LLM framework that contextualizes time-series data before prediction, and TimeCAP, which adds a trainable multi-modal encoder to jointly leverage textual summaries and time-series inputs, retrieving in-context examples to augment prompts and fuse predictions. Across seven real-world datasets in weather, finance, and healthcare, TimeCAP achieves substantial gains—up to an average improvement of 28.75% in F1 score over state-of-the-art baselines—and provides interpretable rationales for predictions. By enabling LMaaS compatibility and offering data and summaries for future research, TimeCAP advances context-aware time-series forecasting with practical implications for climate, economics, and public health analytics.

Abstract

Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.

Paper Structure

This paper contains 14 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Approaches for time series event prediction using LLMs: (a) Existing methods use LLMs directly as predictors for time series data. (b) Our TimeCP employs two LLM agents: the first agent, $\mathcal{A}_\text{C}$, contextualizes time series data into a text summary, and the second agent, $\mathcal{A}_\text{P}$, makes predictions based on this summary. (c) Our TimeCAP incorporates a multi-modal encoder that synergizes with LLM agents. The multi-modal encoder generates predictions using both the generated text and the time series data. Additionally, it samples relevant text from the training set to augment the prompt for $\mathcal{A}_\text{P}$ to make predictions. TimeCAP achieves a 21.98% improvement in F1 scores using contextualization alone and a 28.75% improvement with the addition of augmentation for time series event predictions on real-world datasets.
  • Figure 2: (a) The multi-modal encoder $\mathcal{E}_\phi$ generates an embedding $\bm{z}$ and a prediction $\hat{\bm{y}}_\text{MM}$ based on the multi-modal input $(\bm{x}, \bm{s}_{\bm{x}})$, i.e., time series and its augmented text summary (Eq. \ref{['eq:mm']}). The generated embedding $\bm{z}$ is used to retrieve relevant summaries from the training set to serve as in-context examples to augment the prompt for $\mathcal{A}_\text{P}$ (Eq. \ref{['eq:nearest']}). (b) The similarity patterns within the time series and the text vary; time series similarities are generally high, while text similarities are selectively highlighted, implying complementary information in each modality.
  • Figure 3: A case study on interpretations of TimeCAP. Given a text summary: (a) implicit interpretations depend on the presence of in-context examples (blue), and (b) explicit interpretations involve post-hoc comparisons between the input and a selected in-context example with similar semantics (red, orange, yellow, and green).
  • Figure 4: TimeCAP consistently outperforms its competitors (spec., PatchTST and GPT4TS) across different training ratios. When the training ratio is 0%, TimeCP is used.