Table of Contents
Fetching ...

TRACE: Time SeRies PArameter EffiCient FinE-tuning

Yuze Li, Wei Zhu

TL;DR

TRACE presents a parameter-efficient fine-tuning framework for time series foundation models by coupling two innovations: reconstructed forecasting heads to dramatically reduce head parameters, and Gated Dynamic Simulation Importance Calculation (Gated DSIC) to debias LoRA module selection. The method enables dynamic, reversible pruning of LoRA components and applies a Monte Carlo masking scheme to simulate post-pruning contexts, achieving superior performance on long-horizon forecasting, short-horizon forecasting, anomaly detection, and even NLP benchmarks with minimal trainable parameters. Extensive experiments across diverse datasets demonstrate that TRACE consistently beats linear probing, full fine-tuning, and prior PEFT baselines, with statistical validation and favorable deployment costs. The results highlight TRACE’s practical impact for resource-constrained settings and its generalizability beyond time series to cross-domain sequential tasks.

Abstract

We propose an efficient fine-tuning method for time series foundation models, termed TRACE: Time Series Parameter Efficient Fine-tuning. While pretrained time series foundation models are gaining popularity, they face the following challenges: (1) Unlike natural language tasks, time series data vary in frequency, channel numbers, historical/prediction lengths. For long-term forecasting tasks in particular, tailored fine-tuning can significantly enhance performance.(2) Existing parameter-efficient tuning methods like LoRA remain applicable but require adaptation to temporal characteristics. To address these challenges, our TRACE framework introduces two key innovations: (1) Gated DSIC (Gated Dynamic Simulation Importance Calculation), an unbiased LoRA module importance selection mechanism that ensures conditional parameter consistency before and after masking. Experiments demonstrate that Gated DSIC outperforms common fine-tuning. (2) Reconstructed prediction heads for long-term forecasting tasks, which achieve comparable or superior performance to linear probing heads while drastically reducing parameter counts. Extensive experiments on long-/short-term forecasting, anomaly detection and natural language tasks across diverse datasets, coupled with ablation studies, validate the effectiveness of our method.

TRACE: Time SeRies PArameter EffiCient FinE-tuning

TL;DR

TRACE presents a parameter-efficient fine-tuning framework for time series foundation models by coupling two innovations: reconstructed forecasting heads to dramatically reduce head parameters, and Gated Dynamic Simulation Importance Calculation (Gated DSIC) to debias LoRA module selection. The method enables dynamic, reversible pruning of LoRA components and applies a Monte Carlo masking scheme to simulate post-pruning contexts, achieving superior performance on long-horizon forecasting, short-horizon forecasting, anomaly detection, and even NLP benchmarks with minimal trainable parameters. Extensive experiments across diverse datasets demonstrate that TRACE consistently beats linear probing, full fine-tuning, and prior PEFT baselines, with statistical validation and favorable deployment costs. The results highlight TRACE’s practical impact for resource-constrained settings and its generalizability beyond time series to cross-domain sequential tasks.

Abstract

We propose an efficient fine-tuning method for time series foundation models, termed TRACE: Time Series Parameter Efficient Fine-tuning. While pretrained time series foundation models are gaining popularity, they face the following challenges: (1) Unlike natural language tasks, time series data vary in frequency, channel numbers, historical/prediction lengths. For long-term forecasting tasks in particular, tailored fine-tuning can significantly enhance performance.(2) Existing parameter-efficient tuning methods like LoRA remain applicable but require adaptation to temporal characteristics. To address these challenges, our TRACE framework introduces two key innovations: (1) Gated DSIC (Gated Dynamic Simulation Importance Calculation), an unbiased LoRA module importance selection mechanism that ensures conditional parameter consistency before and after masking. Experiments demonstrate that Gated DSIC outperforms common fine-tuning. (2) Reconstructed prediction heads for long-term forecasting tasks, which achieve comparable or superior performance to linear probing heads while drastically reducing parameter counts. Extensive experiments on long-/short-term forecasting, anomaly detection and natural language tasks across diverse datasets, coupled with ablation studies, validate the effectiveness of our method.

Paper Structure

This paper contains 32 sections, 13 equations, 11 figures, 12 tables, 1 algorithm.

Figures (11)

  • Figure 1: architecture of our method:(a) Represents the default model architecture, which consists of multiple stacked Transformer encoders. LoRA modules are added to all linear layers at each level, and the forecasting head is a linear predictor. Fine-tuning is performed using linear probing and full LoRA adaptation.(b) Represents the TRACE method: LoRA integration follows the Gated DSIC approach, and for long-term forecasting tasks, the forecasting head is dimensionally reduced and reconstructed.
  • Figure 2: Illustration of forecasting head reconstruction. The original large matrix $\mathbf{W} \in \mathbb{R}^{(N \times d) \times H}$ is factorized into two smaller matrices, reducing the embedding dimension from $d$ to $d' = d/\beta$. This significantly cuts the number of trainable parameters without sacrificing predictive power.
  • Figure 3: Visualization of LoRA Module Importance Scores. Left: AdaLoRA (Gradient-based). Middle: Gated DSIC (Ours). Right: Shapley Value. The color intensity represents the magnitude of the importance score, with darker shades indicating higher importance. The x-axis represents different LoRA module types (Query, Key, Value, Output, Gate, Up, Down), and the y-axis represents the Transformer layer number (1-12).
  • Figure 4: Illustration of the Gated DSIC Importance Scoring Process
  • Figure 5: Illustration of the Iterative Pruning Schedule
  • ...and 6 more figures