Table of Contents
Fetching ...

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

Xihong Yang, Siwei Wang, Fangdi Wang, Jiaqi Jin, Suyuan Liu, Yue Liu, En Zhu, Xinwang Liu, Yueming Jin

TL;DR

The paper tackles robustness in deep multi-view clustering under noisy inputs by introducing AIRMVC, a framework that (1) identifies noisy data via a Gaussian Mixture Model reformulated as an anomaly detection problem, (2) applies a hybrid rectification strategy to mitigate noisy influence across views, and (3) employs a noise-robust contrastive mechanism to learn reliable representations. It provides a theoretical result showing that the learned representations retain clean information while discarding noisy content, improving downstream clustering performance. Empirically, AIRMVC outperforms 11 baselines on six benchmark datasets across varying noise levels, and ablation analyses, visualizations, and hyperparameter studies corroborate the effectiveness of each component. The work advances robust unsupervised multi-view learning with practical impact for real-world noisy multi-view data, and the authors provide code for replication.

Abstract

Leveraging the powerful representation learning capabilities, deep multi-view clustering methods have demonstrated reliable performance by effectively integrating multi-source information from diverse views in recent years. Most existing methods rely on the assumption of clean views. However, noise is pervasive in real-world scenarios, leading to a significant degradation in performance. To tackle this problem, we propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC. Specifically, we reformulate noisy identification as an anomaly identification problem using GMM. We then design a hybrid rectification strategy to mitigate the adverse effects of noisy data based on the identification results. Furthermore, we introduce a noise-robust contrastive mechanism to generate reliable representations. Additionally, we provide a theoretical proof demonstrating that these representations can discard noisy information, thereby improving the performance of downstream tasks. Extensive experiments on six benchmark datasets demonstrate that AIRMVC outperforms state-of-the-art algorithms in terms of robustness in noisy scenarios. The code of AIRMVC are available at https://github.com/xihongyang1999/AIRMVC on Github.

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

TL;DR

The paper tackles robustness in deep multi-view clustering under noisy inputs by introducing AIRMVC, a framework that (1) identifies noisy data via a Gaussian Mixture Model reformulated as an anomaly detection problem, (2) applies a hybrid rectification strategy to mitigate noisy influence across views, and (3) employs a noise-robust contrastive mechanism to learn reliable representations. It provides a theoretical result showing that the learned representations retain clean information while discarding noisy content, improving downstream clustering performance. Empirically, AIRMVC outperforms 11 baselines on six benchmark datasets across varying noise levels, and ablation analyses, visualizations, and hyperparameter studies corroborate the effectiveness of each component. The work advances robust unsupervised multi-view learning with practical impact for real-world noisy multi-view data, and the authors provide code for replication.

Abstract

Leveraging the powerful representation learning capabilities, deep multi-view clustering methods have demonstrated reliable performance by effectively integrating multi-source information from diverse views in recent years. Most existing methods rely on the assumption of clean views. However, noise is pervasive in real-world scenarios, leading to a significant degradation in performance. To tackle this problem, we propose a novel multi-view clustering framework for the automatic identification and rectification of noisy data, termed AIRMVC. Specifically, we reformulate noisy identification as an anomaly identification problem using GMM. We then design a hybrid rectification strategy to mitigate the adverse effects of noisy data based on the identification results. Furthermore, we introduce a noise-robust contrastive mechanism to generate reliable representations. Additionally, we provide a theoretical proof demonstrating that these representations can discard noisy information, thereby improving the performance of downstream tasks. Extensive experiments on six benchmark datasets demonstrate that AIRMVC outperforms state-of-the-art algorithms in terms of robustness in noisy scenarios. The code of AIRMVC are available at https://github.com/xihongyang1999/AIRMVC on Github.

Paper Structure

This paper contains 27 sections, 2 theorems, 23 equations, 11 figures, 8 tables.

Key Result

Theorem 4.1

The representations ${\textbf{E}^*}$ retain clean information and discard noisy information, which can be presented as:

Figures (11)

  • Figure 1: An illustrative diagram of noise in a multi-view scenario. In the diagram, the areas marked with red exclamation points indicate instances where sensor failures or malfunctions at specific moments lead to data corruption. Compared to other views, these instances are considered noisy data.
  • Figure 2: Illustration of the overall framework of the proposed AIRMVC. Specifically, we first encode the input multi-view data to generate representations. Next, an automatic noise identification and rectification strategy is introduced to mitigate the adverse impact of noisy data. Simultaneously, we propose a noise-robust contrastive mechanism to generate more reliable and discriminative representations for the downstream clustering task.
  • Figure 3: Ablation studies for our proposed noisy-identification and rectification strategy on BBCSport dataset.
  • Figure 4: Ablation studies on BBCSport, Caltech101, STL10, and Reuters datasets with 10% noisy ratio.
  • Figure 5: Visualization of the representations during the training process on UCI-digit dataset.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 4.1
  • Theorem 1.1
  • proof
  • Definition 1.2: Mutual Information
  • Definition 1.3: Relationship for Mutual Information and Representation
  • Definition 1.4: Mutual Information Constraint