Table of Contents
Fetching ...

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Jiaming Liu, Ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

TL;DR

A continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts and attains state-of-the-art performance in both classification and segmentation CTTA tasks.

Abstract

Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions, addressing real-world dynamism. Existing CTTA methods mainly rely on entropy minimization or teacher-student pseudo-labeling schemes for knowledge extraction in unlabeled target domains. However, dynamic data distributions cause miscalibrated predictions and noisy pseudo-labels in existing self-supervised learning methods, hindering the effective mitigation of error accumulation and catastrophic forgetting problems during the continual adaptation process. To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts. Specifically, we propose a Distribution-aware Masking (DaM) mechanism to adaptively sample masked positions, followed by establishing consistency constraints between the masked target samples and the original target samples. Additionally, for masked tokens, we utilize an efficient decoder to reconstruct a hand-crafted feature descriptor (e.g., Histograms of Oriented Gradients), leveraging its invariant properties to boost task-relevant representations. Through conducting extensive experiments on four widely recognized benchmarks, our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks. Our project page: https://sites.google.com/view/continual-mae/home.

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

TL;DR

A continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts and attains state-of-the-art performance in both classification and segmentation CTTA tasks.

Abstract

Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions, addressing real-world dynamism. Existing CTTA methods mainly rely on entropy minimization or teacher-student pseudo-labeling schemes for knowledge extraction in unlabeled target domains. However, dynamic data distributions cause miscalibrated predictions and noisy pseudo-labels in existing self-supervised learning methods, hindering the effective mitigation of error accumulation and catastrophic forgetting problems during the continual adaptation process. To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts. Specifically, we propose a Distribution-aware Masking (DaM) mechanism to adaptively sample masked positions, followed by establishing consistency constraints between the masked target samples and the original target samples. Additionally, for masked tokens, we utilize an efficient decoder to reconstruct a hand-crafted feature descriptor (e.g., Histograms of Oriented Gradients), leveraging its invariant properties to boost task-relevant representations. Through conducting extensive experiments on four widely recognized benchmarks, our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks. Our project page: https://sites.google.com/view/continual-mae/home.
Paper Structure (24 sections, 7 equations, 9 figures, 8 tables)

This paper contains 24 sections, 7 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: In continually changing environments, existing methods wang2020tentsong2023ecotta primarily focus on applying entropy minimization to update the normalization layer. However, these approaches are susceptible to miscalibrated predictions, resulting in uncontrollable error accumulation. Alternative mainstream approaches wang2022continualgan2022decorate involve the teacher-student scheme for generating pseudo labels, but noisy pseudo labels limit the model's ability for continuous generalization. In this paper, we propose a novel approach to continual self-supervised learning known as Adaptive Distribution Masked Autoencoders (ADMA). ADMA introduces the mask reconstruction mechanism to enhance the extraction of target domain knowledge while mitigating the domain shift accumulation.
  • Figure 2: The framework of Adaptive Distribution Masked Autoencoders (ADMA).(a) We initiate the process by feeding the original target image into the model to generate features of the complete image. Simultaneously, this step facilitates the estimation of token-wise uncertainty, reflecting the token-wise distribution shift of each target sample, a process detailed in Sec. \ref{['sec: DaM']}. Guided by the uncertainty values, we adaptively mask P% of the image tokens characterized by significant domain shifts, subsequently reintroducing the masked image into the model. In the classification task, the encoder's output embeddings are then fed into the classification heads, constructing a consistency loss (Eq. \ref{['eq:con']}) between the two predictions. (b) For the masked tokens, we feed the masked token features into the linear decoder to compute the reconstruction loss (Eq. \ref{['eq:rec']}). We choose Histograms of Oriented Gradients (HOG) as the reconstruction target due to their invariant properties. Both losses are jointly optimized to address the CTTA problem.
  • Figure 3: The visualization of HOG features in various target domain distributions (ImageNet-C hendrycks2019benchmarking).
  • Figure 4: The inter-domain divergency. $T_{1}$ to $T_{15}$ represent the 15 target domains in CIFAR-10C, listed in sequential order.
  • Figure 5: The CAM visualizations.
  • ...and 4 more figures