Lightweight Time Series Data Valuation on Time Series Foundation Models via In-Context Finetuning
Shunyu Wu, Tianyue Li, Yixuan Leng, Jingyi Suo, Jian Lou, Dan Li, See-Kiong Ng
TL;DR
This paper tackles the challenge of valuing time series data for high-capacity TSFMs, where traditional influence-function methods are computationally prohibitive. It introduces LTSV, a lightweight framework that uses one-step in-context finetuning to approximate sample-level influence, augmented by temporal block aggregation to preserve dependencies. The authors show that LTSV achieves linear-time complexity $\mathcal{O}(nP)$ as opposed to the Hessian-based $\mathcal{O}(nP^2 + P^3)$, while delivering faithful valuations that generalize to diverse downstream time series models. Empirical results across five datasets and three TSFM architectures demonstrate that selecting top-valued data based on LTSV consistently improves forecasting performance and that valuations transfer effectively to downstream models, offering a practical tool for data curation in time-series learning. The work provides a principled bridge between data attribution and model generalization in time series, with significant implications for data-efficient training of large TSFMs.
Abstract
Time series foundation models (TSFMs) have demonstrated increasing capabilities due to their extensive pretraining on large volumes of diverse time series data. Consequently, the quality of time series data is crucial to TSFM performance, rendering an accurate and efficient data valuation of time series for TSFMs indispensable. However, traditional data valuation methods, such as influence functions, face severe computational bottlenecks due to their poor scalability with growing TSFM model sizes and often fail to preserve temporal dependencies. In this paper, we propose LTSV, a Lightweight Time Series Valuation on TSFMS via in-context finetuning. Grounded in the theoretical evidence that in-context finetuning approximates the influence function, LTSV estimates a sample's contribution by measuring the change in context loss after in-context finetuning, leveraging the strong generalization capabilities of TSFMs to produce robust and transferable data valuations. To capture temporal dependencies, we introduce temporal block aggregation, which integrates per-block influence scores across overlapping time windows. Experiments across multiple time series datasets and models demonstrate that LTSV consistently provides reliable and strong valuation performance, while maintaining manageable computational requirements. Our results suggest that in-context finetuning on time series foundation models provides a practical and effective bridge between data attribution and model generalization in time series learning.
