Table of Contents
Fetching ...

Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs

Haoran Fan, Bin Li, Yixuan Weng, Shoujun Zhou

TL;DR

This paper tackles the scalability challenge of applying large language models to time-series forecasting by introducing SMETimes, a framework built around sub-3B small language models. It combines three innovations—a statistically enhanced prompting scheme, an adaptive fusion embedding to align numerical and textual signals, and a dynamic mixture-of-experts to specialize across temporal patterns. Empirical results across seven real-world datasets show that SMETimes achieves state-of-the-art performance on five datasets while offering substantial efficiency advantages (e.g., faster training and lower memory usage) and stronger long-horizon forecasting. Ablation studies corroborate the contribution of each component, and the work demonstrates that compact LMs can rival resource-intensive baselines for practical forecasting tasks, with code and models publicly available.

Abstract

While LLMs have demonstrated remarkable potential in time series forecasting, their practical deployment remains constrained by excessive computational demands and memory footprints. Existing LLM-based approaches typically suffer from three critical limitations: Inefficient parameter utilization in handling numerical time series patterns; Modality misalignment between continuous temporal signals and discrete text embeddings; and Inflexibility for real-time expert knowledge integration. We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting. Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through descriptive statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters; And a dynamic mixture-of-experts framework enabled by SLMs' computational efficiency, adaptively combining base predictions with domain-specific models. Extensive evaluations across seven benchmark datasets demonstrate that our 3B-parameter SLM achieves state-of-the-art performance on five primary datasets while maintaining 3.8x faster training and 5.2x lower memory consumption compared to 7B-parameter LLM baselines. Notably, the proposed model exhibits better learning capabilities, achieving 12.3% lower MSE than conventional LLM. Ablation studies validate that our statistical prompting and cross-modal fusion modules respectively contribute 15.7% and 18.2% error reduction in long-horizon forecasting tasks. By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting. Code and models are available at https://github.com/xiyan1234567/SMETimes.

Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs

TL;DR

This paper tackles the scalability challenge of applying large language models to time-series forecasting by introducing SMETimes, a framework built around sub-3B small language models. It combines three innovations—a statistically enhanced prompting scheme, an adaptive fusion embedding to align numerical and textual signals, and a dynamic mixture-of-experts to specialize across temporal patterns. Empirical results across seven real-world datasets show that SMETimes achieves state-of-the-art performance on five datasets while offering substantial efficiency advantages (e.g., faster training and lower memory usage) and stronger long-horizon forecasting. Ablation studies corroborate the contribution of each component, and the work demonstrates that compact LMs can rival resource-intensive baselines for practical forecasting tasks, with code and models publicly available.

Abstract

While LLMs have demonstrated remarkable potential in time series forecasting, their practical deployment remains constrained by excessive computational demands and memory footprints. Existing LLM-based approaches typically suffer from three critical limitations: Inefficient parameter utilization in handling numerical time series patterns; Modality misalignment between continuous temporal signals and discrete text embeddings; and Inflexibility for real-time expert knowledge integration. We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting. Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through descriptive statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters; And a dynamic mixture-of-experts framework enabled by SLMs' computational efficiency, adaptively combining base predictions with domain-specific models. Extensive evaluations across seven benchmark datasets demonstrate that our 3B-parameter SLM achieves state-of-the-art performance on five primary datasets while maintaining 3.8x faster training and 5.2x lower memory consumption compared to 7B-parameter LLM baselines. Notably, the proposed model exhibits better learning capabilities, achieving 12.3% lower MSE than conventional LLM. Ablation studies validate that our statistical prompting and cross-modal fusion modules respectively contribute 15.7% and 18.2% error reduction in long-horizon forecasting tasks. By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting. Code and models are available at https://github.com/xiyan1234567/SMETimes.

Paper Structure

This paper contains 25 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Performance-efficiency Trade-off Comparison on ETTh1 zhou2021informer Dataset. Our SLM variants (blue) achieve competitive MSE with significantly lower training time and memory footprint compared to conventional LLM-based methods. Bubble size represents relative memory consumption.
  • Figure 2: Structure of the proposed SMETimes framework, featuring three core innovations: (1) Statistically enhanced prompt structure for numerical-textual alignment; (2) Adaptive fusion embedding structure with dynamic gating structures; And (3) dynamic Mixture-of-Experts structure for efficient specialization.
  • Figure 3: This prompt structure combines Timestamp Descriptor and Statistical Descriptor to systematically characterize time series data.
  • Figure 4: Hyperparameter sensitivity of SMETimes. Each curve presents a specific dataset.
  • Figure 5: Long-term forecasting cases from ETTh1 zhou2021informer by different models under the input-672-predict-96 settings. Blue lines are the ground truths and orange lines are the model predictions.