Table of Contents
Fetching ...

Resilient Contrastive Pre-training under Non-Stationary Drift

Xiaoyu Yang, Jie Lu, En Yu, Wei Duan

TL;DR

This work tackles the vulnerability of contrastive pre-training to non-stationary concept drift in data streams. It introduces a causal-interventional framework, grounded in a structural causal model, to mitigate drift-induced bias via P_t(Y|do(X)) and an intervention module within a MoCo v3–style architecture, forming Resilient Contrastive Pre-training (RCP). Across long-tailed, domain-shift, and OOD settings, RCP yields improved downstream performance, stronger ID intra-class compactness, and clearer ID–OOD separation, while scaling effectively with larger ViT backbones. The proposed method offers a practical, scalable path to robust pre-training on evolving data, with broad implications for real-world vision systems operating under continual distribution changes.

Abstract

The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift - unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a novel method that incorporates causal intervention. RCP formulates a causally-informed objective to mitigate drift-induced biases through targeted interventions. The method is designed for simple and scalable implementation and exhibits notable adaptability, promoting robust and autonomous pre-training on non-stationary data. Comprehensive experiments across various downstream tasks consistently demonstrate that RCP effectively alleviates the detrimental impact of concept drift, yielding more resilient and generalizable representations.

Resilient Contrastive Pre-training under Non-Stationary Drift

TL;DR

This work tackles the vulnerability of contrastive pre-training to non-stationary concept drift in data streams. It introduces a causal-interventional framework, grounded in a structural causal model, to mitigate drift-induced bias via P_t(Y|do(X)) and an intervention module within a MoCo v3–style architecture, forming Resilient Contrastive Pre-training (RCP). Across long-tailed, domain-shift, and OOD settings, RCP yields improved downstream performance, stronger ID intra-class compactness, and clearer ID–OOD separation, while scaling effectively with larger ViT backbones. The proposed method offers a practical, scalable path to robust pre-training on evolving data, with broad implications for real-world vision systems operating under continual distribution changes.

Abstract

The remarkable success of large-scale contrastive pre-training has been largely driven by by vast yet static datasets. However, as the scaling paradigm evolves, this paradigm encounters a fundamental challenge when applied to dynamic data streams characterized by concept drift - unpredictable changes in the underlying data distribution. This paper aims to advance robust pre-training under such non-stationary environments. We begin by revealing that conventional contrastive pre-training methods are highly susceptible to concept drift, resulting in significant substantial bias and instability within the learned feature representations. To systematically analyze these effects, we develop a structural causal model that elucidates how drift acts as a confounder, distorting the learned representations. Based on these causal insights, we propose Resilient Contrastive Pre-training (RCP), a novel method that incorporates causal intervention. RCP formulates a causally-informed objective to mitigate drift-induced biases through targeted interventions. The method is designed for simple and scalable implementation and exhibits notable adaptability, promoting robust and autonomous pre-training on non-stationary data. Comprehensive experiments across various downstream tasks consistently demonstrate that RCP effectively alleviates the detrimental impact of concept drift, yielding more resilient and generalizable representations.

Paper Structure

This paper contains 21 sections, 9 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: The t-SNE visualization of feature space under the different conditions of pre-training within ImageNet and ImageNet-LT. The dark colors signify the region corresponding to the tail category with limited pre-training samples, whereas light colors denote the head category characterized by abundant samples.
  • Figure 2: The proposed causal graph of contrastive pre-training. X: Sample Features, Y: Prediction, D: Latent Concept Drift within Data Streams, and B: Sample Bias in the Momentum Update.
  • Figure 3: Comparison between different contrastive pre-training paradigms. (a) The key representations are sampled from a memory bank. (b) A momentum-updated encoder maintains the queue of keys. (c) The workflow of our resilient contrastive pre-training under concept drift streaming. Within the data streaming, a large batch size is opted for a wider drift adaptation window sliding to adapt changes in data distribution. Undergoes various random augmentations, the transformed instances from the identical sample are feature-extracted by both the encoder and the momentum encoder to get the key and value, respectively. A head is utilized to debias the drift and obtain the query of the encoder features. Subsequently, causal intervention is utilized to alleviate concept drift in the data stream within the adaptation window, resulting in the acquisition of two objects for contrastive learning.