Table of Contents
Fetching ...

Lightweight Time Series Data Valuation on Time Series Foundation Models via In-Context Finetuning

Shunyu Wu, Tianyue Li, Yixuan Leng, Jingyi Suo, Jian Lou, Dan Li, See-Kiong Ng

TL;DR

This paper tackles the challenge of valuing time series data for high-capacity TSFMs, where traditional influence-function methods are computationally prohibitive. It introduces LTSV, a lightweight framework that uses one-step in-context finetuning to approximate sample-level influence, augmented by temporal block aggregation to preserve dependencies. The authors show that LTSV achieves linear-time complexity $\mathcal{O}(nP)$ as opposed to the Hessian-based $\mathcal{O}(nP^2 + P^3)$, while delivering faithful valuations that generalize to diverse downstream time series models. Empirical results across five datasets and three TSFM architectures demonstrate that selecting top-valued data based on LTSV consistently improves forecasting performance and that valuations transfer effectively to downstream models, offering a practical tool for data curation in time-series learning. The work provides a principled bridge between data attribution and model generalization in time series, with significant implications for data-efficient training of large TSFMs.

Abstract

Time series foundation models (TSFMs) have demonstrated increasing capabilities due to their extensive pretraining on large volumes of diverse time series data. Consequently, the quality of time series data is crucial to TSFM performance, rendering an accurate and efficient data valuation of time series for TSFMs indispensable. However, traditional data valuation methods, such as influence functions, face severe computational bottlenecks due to their poor scalability with growing TSFM model sizes and often fail to preserve temporal dependencies. In this paper, we propose LTSV, a Lightweight Time Series Valuation on TSFMS via in-context finetuning. Grounded in the theoretical evidence that in-context finetuning approximates the influence function, LTSV estimates a sample's contribution by measuring the change in context loss after in-context finetuning, leveraging the strong generalization capabilities of TSFMs to produce robust and transferable data valuations. To capture temporal dependencies, we introduce temporal block aggregation, which integrates per-block influence scores across overlapping time windows. Experiments across multiple time series datasets and models demonstrate that LTSV consistently provides reliable and strong valuation performance, while maintaining manageable computational requirements. Our results suggest that in-context finetuning on time series foundation models provides a practical and effective bridge between data attribution and model generalization in time series learning.

Lightweight Time Series Data Valuation on Time Series Foundation Models via In-Context Finetuning

TL;DR

This paper tackles the challenge of valuing time series data for high-capacity TSFMs, where traditional influence-function methods are computationally prohibitive. It introduces LTSV, a lightweight framework that uses one-step in-context finetuning to approximate sample-level influence, augmented by temporal block aggregation to preserve dependencies. The authors show that LTSV achieves linear-time complexity as opposed to the Hessian-based , while delivering faithful valuations that generalize to diverse downstream time series models. Empirical results across five datasets and three TSFM architectures demonstrate that selecting top-valued data based on LTSV consistently improves forecasting performance and that valuations transfer effectively to downstream models, offering a practical tool for data curation in time-series learning. The work provides a principled bridge between data attribution and model generalization in time series, with significant implications for data-efficient training of large TSFMs.

Abstract

Time series foundation models (TSFMs) have demonstrated increasing capabilities due to their extensive pretraining on large volumes of diverse time series data. Consequently, the quality of time series data is crucial to TSFM performance, rendering an accurate and efficient data valuation of time series for TSFMs indispensable. However, traditional data valuation methods, such as influence functions, face severe computational bottlenecks due to their poor scalability with growing TSFM model sizes and often fail to preserve temporal dependencies. In this paper, we propose LTSV, a Lightweight Time Series Valuation on TSFMS via in-context finetuning. Grounded in the theoretical evidence that in-context finetuning approximates the influence function, LTSV estimates a sample's contribution by measuring the change in context loss after in-context finetuning, leveraging the strong generalization capabilities of TSFMs to produce robust and transferable data valuations. To capture temporal dependencies, we introduce temporal block aggregation, which integrates per-block influence scores across overlapping time windows. Experiments across multiple time series datasets and models demonstrate that LTSV consistently provides reliable and strong valuation performance, while maintaining manageable computational requirements. Our results suggest that in-context finetuning on time series foundation models provides a practical and effective bridge between data attribution and model generalization in time series learning.

Paper Structure

This paper contains 16 sections, 2 theorems, 11 equations, 4 figures, 3 tables.

Key Result

theorem thmcountertheorem

Let the target dataset be $\mathcal{D}_{\mathrm{target}} = \{(x_i, y_i)\}_{i=1}^{N}$ and the context dataset be $\mathcal{D}_{\mathrm{context}} = \{(x'_j, y'_j)\}_{j=1}^{M}$. Let $\mathcal{L}(x, y; \theta)$ denote the loss function of a model parameterized by $\theta$ for a target sample. The influe where $H_{\theta^*} = \frac{1}{N} \sum_{i=1}^{N} \nabla_\theta^2 \mathcal{L}(x_i, y_i; \theta^*)$ i

Figures (4)

  • Figure 1: Overview of the proposed framework. ① Block Segmentation: the original time series is first divided into sub-sequences via sliding block segmentation. ② Block Scoring: each block is applied to fine-tune the TSFM, and the differences between the pre-trained context losses and fine-tuned losses are calculated as the block-wise quality scores. ③ Point Scoring: point-wise scores are aggregated based on the block-wise scores along the original series. ④ Sample Scoring: sample-wise scores are generated based on the scores of each point.
  • Figure 2: Computational efficiency comparison between LTSV and influence function across models of varying parameter sizes.
  • Figure 3: Performance variation with different selection ratios on the Electricity dataset under DLinear. The curves show a consistent trend where bottom-valued samples yield higher MSE and MAE than random selection, while top-valued samples outperform both. As the selection ratio increases, the performance gap among the three gradually narrows. Notably, the curve of LTSV (top) remains close to that of the Influence Function, indicating strong consistency with classical valuation results.
  • Figure 4: Downstream performance when trained on top- and bottom-valued samples identified by LTSV. The left panel shows results based on the MOMENT foundation model, while the right panel corresponds to Time-LLM. Models trained on bottom-valued samples show consistently higher MAE, while top-valued samples yield much lower errors, confirming the robustness and transferability of LTSV across architectures.

Theorems & Definitions (5)

  • theorem thmcountertheorem: Classical Influence Function
  • proof
  • definition thmcounterdefinition: In-Context Finetuning
  • theorem thmcountertheorem: Influence Function Approximation via In-Context Finetuning
  • proof