Deciphering Invariant Feature Decoupling in Source-free Time Series Forecasting with Proxy Denoising
Kangjia Yan, Chenxi Liu, Hao Miao, Xinle Wu, Yan Zhao, Chenjuan Guo, Bin Yang
TL;DR
This work addresses the problem of source-free domain adaptation for time series forecasting, where a pretrained source model must adapt to a sparse target domain without access to source data. It introduces TimePD, a framework that couples invariant disentangled feature learning to separate seasonal and trend information, proxy denoising to calibrate LLM forecasts, and knowledge distillation to transfer denoised knowledge to a lightweight target model. Empirical results on six real-world datasets demonstrate that TimePD consistently outperforms state-of-the-art baselines across multiple forecast horizons, highlighting strong generalization under data scarcity and domain shift. The approach offers a practical path toward privacy-preserving, cross-domain time series analytics by leveraging LLMs while mitigating hallucinations through data-driven denoising and distillation.
Abstract
The proliferation of mobile devices generates a massive volume of time series across various domains, where effective time series forecasting enables a variety of real-world applications. This study focuses on a new problem of source-free domain adaptation for time series forecasting. It aims to adapt a pretrained model from sufficient source time series to the sparse target time series domain without access to the source data, embracing data protection regulations. To achieve this, we propose TimePD, the first source-free time series forecasting framework with proxy denoising, where large language models (LLMs) are employed to benefit from their generalization capabilities. Specifically, TimePD consists of three key components: (1) dual-branch invariant disentangled feature learning that enforces representation- and gradient-wise invariance by means of season-trend decomposition; (2) lightweight, parameter-free proxy denoising that dynamically calibrates systematic biases of LLMs; and (3) knowledge distillation that bidirectionally aligns the denoised prediction and the original target prediction. Extensive experiments on real-world datasets offer insight into the effectiveness of the proposed TimePD, outperforming SOTA baselines by 9.3% on average.
