TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
Geon Lee, Wenchao Yu, Kijung Shin, Wei Cheng, Haifeng Chen
TL;DR
The paper tackles the challenge of predicting time-series events when contextual information is essential. It introduces TimeCP, a dual-LLM framework that contextualizes time-series data before prediction, and TimeCAP, which adds a trainable multi-modal encoder to jointly leverage textual summaries and time-series inputs, retrieving in-context examples to augment prompts and fuse predictions. Across seven real-world datasets in weather, finance, and healthcare, TimeCAP achieves substantial gains—up to an average improvement of 28.75% in F1 score over state-of-the-art baselines—and provides interpretable rationales for predictions. By enabling LMaaS compatibility and offering data and summaries for future research, TimeCAP advances context-aware time-series forecasting with practical implications for climate, economics, and public health analytics.
Abstract
Time series data is essential in various applications, including climate modeling, healthcare monitoring, and financial analytics. Understanding the contextual information associated with real-world time series data is often essential for accurate and reliable event predictions. In this paper, we introduce TimeCAP, a time-series processing framework that creatively employs Large Language Models (LLMs) as contextualizers of time series data, extending their typical usage as predictors. TimeCAP incorporates two independent LLM agents: one generates a textual summary capturing the context of the time series, while the other uses this enriched summary to make more informed predictions. In addition, TimeCAP employs a multi-modal encoder that synergizes with the LLM agents, enhancing predictive performance through mutual augmentation of inputs with in-context examples. Experimental results on real-world datasets demonstrate that TimeCAP outperforms state-of-the-art methods for time series event prediction, including those utilizing LLMs as predictors, achieving an average improvement of 28.75% in F1 score.
