Resilient Contrastive Pre-training under Non-Stationary Drift
Xiaoyu Yang, Jie Lu, En Yu, Wei Duan
TL;DR
This work tackles the vulnerability of contrastive pre-training to non-stationary concept drift in data streams. It introduces a causal-interventional framework, grounded in a structural causal model, to mitigate drift-induced bias via P_t(Y|do(X)) and an intervention module within a MoCo v3–style architecture, forming Resilient Contrastive Pre-training (RCP). Across long-tailed, domain-shift, and OOD settings, RCP yields improved downstream performance, stronger ID intra-class compactness, and clearer ID–OOD separation, while scaling effectively with larger ViT backbones. The proposed method offers a practical, scalable path to robust pre-training on evolving data, with broad implications for real-world vision systems operating under continual distribution changes.
Abstract
The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift - unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a novel method that incorporates causal intervention. RCP formulates a causally-informed objective to mitigate drift-induced biases through targeted interventions. The method is designed for simple and scalable implementation and exhibits notable adaptability, promoting robust and autonomous pre-training on non-stationary data. Comprehensive experiments across various downstream tasks consistently demonstrate that RCP effectively alleviates the detrimental impact of concept drift, yielding more resilient and generalizable representations.
