Table of Contents
Fetching ...

Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises

Zirun Guo, Tao Jin

TL;DR

This work tackles Test-Time Adaptation under complex multimodal noises by defining multimodal wild TTA, where target data exhibit mixed weak and strong OOD shifts and potential missing modalities. It introduces SuMi, a framework that blends Interquartile Range Smoothing (IQR), unimodal-assisted sample identification, and Mutual Information Sharing (MIS) to stabilize adaptation and better leverage multimodal information. The method is formalized with an entropy objective $Ent_ heta$, a dynamic sampling process, and cross-modality alignment via KL divergences, achieving state-of-the-art results on Kinetics50-C and VGGSound-C under various OOD scenarios. Ablation studies confirm the contribution of each component, highlighting IQR smoothing as the key driver of stability, with unimodal assistance and MIS further enhancing performance across weak, strong, and mixed disturbances.

Abstract

Test-Time Adaptation (TTA) aims to tackle distribution shifts using unlabeled test data without access to the source data. In the context of multimodal data, there are more complex noise patterns than unimodal data such as simultaneous corruptions for multiple modalities and missing modalities. Besides, in real-world applications, corruptions from different distribution shifts are always mixed. Existing TTA methods always fail in such multimodal scenario because the abrupt distribution shifts will destroy the prior knowledge from the source model, thus leading to performance degradation. To this end, we reveal a new challenge named multimodal wild TTA. To address this challenging problem, we propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing (SuMi). SuMi smooths the adaptation process by interquartile range which avoids the abrupt distribution shifts. Then, SuMi fully utilizes the unimodal features to select low-entropy samples with rich multimodal information for optimization. Furthermore, mutual information sharing is introduced to align the information, reduce the discrepancies and enhance the information utilization across different modalities. Extensive experiments on two public datasets show the effectiveness and superiority over existing methods under the complex noise patterns in multimodal data. Code is available at https://github.com/zrguo/SuMi.

Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises

TL;DR

This work tackles Test-Time Adaptation under complex multimodal noises by defining multimodal wild TTA, where target data exhibit mixed weak and strong OOD shifts and potential missing modalities. It introduces SuMi, a framework that blends Interquartile Range Smoothing (IQR), unimodal-assisted sample identification, and Mutual Information Sharing (MIS) to stabilize adaptation and better leverage multimodal information. The method is formalized with an entropy objective , a dynamic sampling process, and cross-modality alignment via KL divergences, achieving state-of-the-art results on Kinetics50-C and VGGSound-C under various OOD scenarios. Ablation studies confirm the contribution of each component, highlighting IQR smoothing as the key driver of stability, with unimodal assistance and MIS further enhancing performance across weak, strong, and mixed disturbances.

Abstract

Test-Time Adaptation (TTA) aims to tackle distribution shifts using unlabeled test data without access to the source data. In the context of multimodal data, there are more complex noise patterns than unimodal data such as simultaneous corruptions for multiple modalities and missing modalities. Besides, in real-world applications, corruptions from different distribution shifts are always mixed. Existing TTA methods always fail in such multimodal scenario because the abrupt distribution shifts will destroy the prior knowledge from the source model, thus leading to performance degradation. To this end, we reveal a new challenge named multimodal wild TTA. To address this challenging problem, we propose two novel strategies: sample identification with interquartile range Smoothing and unimodal assistance, and Mutual information sharing (SuMi). SuMi smooths the adaptation process by interquartile range which avoids the abrupt distribution shifts. Then, SuMi fully utilizes the unimodal features to select low-entropy samples with rich multimodal information for optimization. Furthermore, mutual information sharing is introduced to align the information, reduce the discrepancies and enhance the information utilization across different modalities. Extensive experiments on two public datasets show the effectiveness and superiority over existing methods under the complex noise patterns in multimodal data. Code is available at https://github.com/zrguo/SuMi.

Paper Structure

This paper contains 20 sections, 8 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: Illustration of our task where the target domain includes various domain shifts including weak OOD and strong OOD samples. The performances of existing methods degrade significantly on this challenging task, even worse than the source model. We get these results on Kinetics50-C.
  • Figure 2: The overview of SuMi.
  • Figure 3: (a) Performance of different adaptation settings on strong OOD samples. (b) t-SNE visualizations van2008visualizing of features during adaptation. (c) Performance using different quantiles multimodal and unimodal entropy. Results are obtained on Kinetics50-C.
  • Figure 4: Comparison with SOTA methods on corrupted data of different severity levels. weak: average accuracy of 21 different types of weak OOD distribution shifts. strong: average accuracy of 4 different types of strong OOD distribution shifts.
  • Figure 5: Comparison with SOTA methods on mixed corrupted data with ten different ratios of strong OOD samples. (a) and (b): severity level 5. (c) and (d): mixed severity.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 1