Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu; Yi Xie; Yun Xiong; Hao Niu; Xiaofeng Gao

Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu, Yi Xie, Yun Xiong, Hao Niu, Xiaofeng Gao

TL;DR

DropPatch introduces random sub-sequence patch dropping before masking to improve masked time-series pre-training. By dropping patches with ratio $r$ prior to masking and reconstructing only the remaining masked patches, the method strengthens attention focus, reduces redundancy, and provides data augmentation that mitigates over-fitting. Empirical results across 12 real datasets and synthesized corpora show consistent gains in in-domain and cross-domain forecasting, along with improved training efficiency; theoretical analysis shows slower convergence to a rank-1 representation in Transformer layers. Attention- and representation-level analyses (KL divergence of attention, head diversity, and CKAs) support the mechanism behind performance gains, suggesting DropPatch as a practical augment for time-series foundation models with broad applicability to domain-adaptation and low-data regimes.

Abstract

This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus optimizing the quality of the learned representations

Enhancing Masked Time-Series Modeling via Dropping Patches

TL;DR

DropPatch introduces random sub-sequence patch dropping before masking to improve masked time-series pre-training. By dropping patches with ratio

prior to masking and reconstructing only the remaining masked patches, the method strengthens attention focus, reduces redundancy, and provides data augmentation that mitigates over-fitting. Empirical results across 12 real datasets and synthesized corpora show consistent gains in in-domain and cross-domain forecasting, along with improved training efficiency; theoretical analysis shows slower convergence to a rank-1 representation in Transformer layers. Attention- and representation-level analyses (KL divergence of attention, head diversity, and CKAs) support the mechanism behind performance gains, suggesting DropPatch as a practical augment for time-series foundation models with broad applicability to domain-adaptation and low-data regimes.

Enhancing Masked Time-Series Modeling via Dropping Patches

TL;DR

Abstract

Enhancing Masked Time-Series Modeling via Dropping Patches

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)