SciTS: Scientific Time Series Understanding and Generation with LLMs
Wen Wu, Ziyang Zhang, Liwei Liu, Xuenan Xu, Junlin Liu, Ke Fan, Qitan Lv, Jimin Zhuang, Chen Zhang, Zheqi Yuan, Siyuan Hou, Tianyi Lin, Kai Chen, Bowen Zhou, Chao Zhang
TL;DR
The paper tackles the challenge of scientific time series understanding and generation by introducing SciTS, a large-scale, cross-domain benchmark spanning 12 disciplines, 43 tasks, and over 50k samples, with both univariate and multivariate signals up to $10^7$ length and frequencies reaching $10$ MHz. It shows that conventional approaches that convert signals to text or images limit precision and scalability, and that general-purpose LLMs often generalise better than specialised time-series models. To address this, the authors propose TimeOmni, an LLM-based framework that explicitly models temporal dynamics via a Time Series Encoder with a routered patch-expert architecture and cross-attention to vocabulary embeddings, enabling both understanding and generation within a single framework compatible with standard LLM training. Large-scale evaluation across 17 models demonstrates TimeOmni’s strong performance and full task coverage, highlighting the value of treating time series as a dedicated modality within LLMs for broad scientific applicability and discovery.
Abstract
The scientific reasoning ability of large language models (LLMs) has recently attracted significant attention. Time series, as a fundamental modality in scientific data, presents unique challenges that are often overlooked in current multimodal LLMs, which either encode numerical sequences as text or convert them into images. Such approaches may be insufficient for comprehensive scientific time series understanding and generation. Existing unified time series models typically specialise in either forecasting or analysis, and their effectiveness on non-periodic, heterogeneous scientific signals remains unclear. To address these gaps, we introduce SciTS, a benchmark spanning 12 scientific domains and 43 tasks, with over 50k+ instances, both univariate and multivariate signals ranging from $10^0$ to $10^7$ in length and up to 10~MHz in frequency. We benchmark 17 models, including text-only LLMs, multimodal LLMs, and unified time series models, and find that general-purpose LLMs exhibit stronger generalisability than specialised time series models, while representing time series as text or images limits their performance due to excessively long sequences and loss of numerical precision, respectively. We then introduce TimeOmni, a framework that equips LLMs with the ability to understand and generate time series while remaining compatible with general-purpose LLM training. This work fills a gap in both dedicated benchmarks and modelling frameworks for scientific time series, paving the way for LLMs to understand and generate complex temporal scientific data.
