ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, Jianxin Liao
TL;DR
ChatTime introduces a unified multimodal time-series foundation model by expanding an LLM vocabulary to encode time-series as a discrete foreign language, enabling zero-shot forecasting and bimodal input/output with text. The approach uses normalization, discretization into 10K bins, and mark characters to convert real-valued series into tokens, followed by continuous pre-training and instruction fine-tuning on 4-bit quantized models with LoRA. It demonstrates strong performance on zero-shot forecasting, context-guided forecasting, and time-series question answering, while providing four multimodal datasets to fill data gaps. The work highlights the potential of leveraging pre-trained LLMs for efficient, adaptable time-series analysis and points to future directions such as anomaly detection and summarization.
Abstract
Human experts typically integrate numerical and textual multimodal information to analyze time series. However, most traditional deep learning predictors rely solely on unimodal numerical data, using a fixed-length window for training and prediction on a single dataset, and cannot adapt to different scenarios. The powered pre-trained large language model has introduced new opportunities for time series analysis. Yet, existing methods are either inefficient in training, incapable of handling textual information, or lack zero-shot forecasting capability. In this paper, we innovatively model time series as a foreign language and construct ChatTime, a unified framework for time series and text processing. As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability and supports bimodal input/output for both time series and text. We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios, and create four multimodal datasets to address data gaps. The experimental results demonstrate the potential and utility of ChatTime.
