Table of Contents
Fetching ...

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

Jiafeng Lin, Yuxuan Wang, Jialong Wu, Huakun Luo, Zhongyi Pei, Jianmin Wang

TL;DR

Thoth is proposed, the first family of mid-trained LLMs with general-purpose time series understanding capabilities, and Book-of-Thoth, a high-quality, time-series-centric mid-training corpus, is constructed, equipping LLMs with a foundational grasp of temporal patterns.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics. In this paper, we propose Thoth, the first family of mid-trained LLMs with general-purpose time series understanding capabilities. As a pivotal intermediate stage, mid-training achieves task- and domain-agnostic alignment between time series and natural language, for which we construct Book-of-Thoth, a high-quality, time-series-centric mid-training corpus. Book-of-Thoth enables both time-series-to-text and text-to-time-series generation, equipping LLMs with a foundational grasp of temporal patterns. To better evaluate advanced reasoning capabilities, we further present KnoTS, a novel benchmark of knowledge-intensive time series understanding, designed for joint reasoning over temporal patterns and domain knowledge. Extensive experiments demonstrate that mid-training with Book-of-Thoth enables Thoth to significantly outperform its base model and advanced LLMs across a range of time series question answering benchmarks. Moreover, Thoth exhibits superior capabilities when fine-tuned under data scarcity, underscoring the effectiveness of mid-training for time series understanding. Code is available at: https://github.com/thuml/Thoth.

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

TL;DR

Thoth is proposed, the first family of mid-trained LLMs with general-purpose time series understanding capabilities, and Book-of-Thoth, a high-quality, time-series-centric mid-training corpus, is constructed, equipping LLMs with a foundational grasp of temporal patterns.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics. In this paper, we propose Thoth, the first family of mid-trained LLMs with general-purpose time series understanding capabilities. As a pivotal intermediate stage, mid-training achieves task- and domain-agnostic alignment between time series and natural language, for which we construct Book-of-Thoth, a high-quality, time-series-centric mid-training corpus. Book-of-Thoth enables both time-series-to-text and text-to-time-series generation, equipping LLMs with a foundational grasp of temporal patterns. To better evaluate advanced reasoning capabilities, we further present KnoTS, a novel benchmark of knowledge-intensive time series understanding, designed for joint reasoning over temporal patterns and domain knowledge. Extensive experiments demonstrate that mid-training with Book-of-Thoth enables Thoth to significantly outperform its base model and advanced LLMs across a range of time series question answering benchmarks. Moreover, Thoth exhibits superior capabilities when fine-tuned under data scarcity, underscoring the effectiveness of mid-training for time series understanding. Code is available at: https://github.com/thuml/Thoth.
Paper Structure (45 sections, 4 equations, 16 figures, 5 tables)

This paper contains 45 sections, 4 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Pivotal role of mid-training in bridging LLMs to time series understanding. Mid-training expands the capabilities of pre-trained LLMs while providing a critical transition and warm-up for subsequent post-training on specific time series question answering tasks.
  • Figure 2: The automated pipeline starts by synthesizing time series through combinations of diverse base functions. For each generated series, both the raw values and their visualizations are provided to GPT‑5.2 to produce a diverse range of time-series-to-text descriptions. These textual descriptions are then paired with the corresponding time series to construct complementary text-to-time-series generation data. In addition, a curated corpus of textual time series knowledge is incorporated to preserve the general-purpose capabilities of the pre-trained models during mid-training.
  • Figure 3: Components of Book of Thoth with 26.6M tokens.
  • Figure 4: A brief case of a reasoning problem from the KnoTS benchmark, the complete case can be found in the Appendix \ref{['app:knots-cases']}.
  • Figure 5: Performance of various models in reasoning and decision-making on the KnoTS benchmark. All evaluations are conducted using Gemini-3-Pro-Preview with scores ranging from 0.0 to 10.0.
  • ...and 11 more figures