Table of Contents
Fetching ...

THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering

Jian Zhu

TL;DR

This work tackles untrustworthy fusion in Multi-View Clustering by introducing THCRL, a framework with Deep Symmetry Hierarchical Fusion (DSHF) and Average $K$-Nearest Neighbors Contrastive Learning (AKCL). DSHF uses a UNet-based, attention-empowered architecture to denoise and fuse multi-view features, while AKCL aligns fused representations with cluster-consistent samples rather than only cross-view instances. The approach yields state-of-the-art results across six public MVC datasets, with comprehensive ablation and visualization confirming the contributions of both modules. The method demonstrates strong generalization and robustness to hyperparameters, suggesting practical applicability for reliable multi-view analysis in diverse domains.

Abstract

Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However, a significant challenge remains: the problem of untrustworthy fusion. This problem primarily arises from two key factors: 1) Existing methods often ignore the presence of inherent noise within individual views; 2) In traditional MVC methods using Contrastive Learning (CL), similarity computations typically rely on different views of the same instance, while neglecting the structural information from nearest neighbors within the same cluster. Consequently, this leads to the wrong direction for multi-view fusion. To address this problem, we present a novel Trusted Hierarchical Contrastive Representation Learning (THCRL). It consists of two key modules. Specifically, we propose the Deep Symmetry Hierarchical Fusion (DSHF) module, which leverages the UNet architecture integrated with multiple denoising mechanisms to achieve trustworthy fusion of multi-view data. Furthermore, we present the Average K-Nearest Neighbors Contrastive Learning (AKCL) module to align the fused representation with the view-specific representation. Unlike conventional strategies, AKCL enhances representation similarity among samples belonging to the same cluster, rather than merely focusing on the same sample across views, thereby reinforcing the confidence of the fused representation. Extensive experiments demonstrate that THCRL achieves the state-of-the-art performance in deep MVC tasks.

THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering

TL;DR

This work tackles untrustworthy fusion in Multi-View Clustering by introducing THCRL, a framework with Deep Symmetry Hierarchical Fusion (DSHF) and Average -Nearest Neighbors Contrastive Learning (AKCL). DSHF uses a UNet-based, attention-empowered architecture to denoise and fuse multi-view features, while AKCL aligns fused representations with cluster-consistent samples rather than only cross-view instances. The approach yields state-of-the-art results across six public MVC datasets, with comprehensive ablation and visualization confirming the contributions of both modules. The method demonstrates strong generalization and robustness to hyperparameters, suggesting practical applicability for reliable multi-view analysis in diverse domains.

Abstract

Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However, a significant challenge remains: the problem of untrustworthy fusion. This problem primarily arises from two key factors: 1) Existing methods often ignore the presence of inherent noise within individual views; 2) In traditional MVC methods using Contrastive Learning (CL), similarity computations typically rely on different views of the same instance, while neglecting the structural information from nearest neighbors within the same cluster. Consequently, this leads to the wrong direction for multi-view fusion. To address this problem, we present a novel Trusted Hierarchical Contrastive Representation Learning (THCRL). It consists of two key modules. Specifically, we propose the Deep Symmetry Hierarchical Fusion (DSHF) module, which leverages the UNet architecture integrated with multiple denoising mechanisms to achieve trustworthy fusion of multi-view data. Furthermore, we present the Average K-Nearest Neighbors Contrastive Learning (AKCL) module to align the fused representation with the view-specific representation. Unlike conventional strategies, AKCL enhances representation similarity among samples belonging to the same cluster, rather than merely focusing on the same sample across views, thereby reinforcing the confidence of the fused representation. Extensive experiments demonstrate that THCRL achieves the state-of-the-art performance in deep MVC tasks.

Paper Structure

This paper contains 21 sections, 20 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Deep Symmetry Hierarchical Fusion (DSHF). DSHF has multiple denoising mechanisms. First, the View Attention Network performs dynamic weighting on the input features. Second, the Initial Project maps the multi-view feature into a unified feature space. Third, hierarchical fusion achieves denoising features through the UNet architecture with attention networks. Finally, the Final Project reduces the number of feature channels to the original number of views.
  • Figure 2: The visualization results of the fused representations $\{\hat{h}_{i}\}^N_{i=1}$ on the Hdigit, MNIST, and Synthetic3d datasets after convergence.
  • Figure 3: The convergence analysis on the Hdigit dataset. In the figure, the test ACC, NMI, and PUR are shown at the top, and the training loss is depicted at the bottom.
  • Figure 4: The hyperparameter analysis on the Hdigit dataset. The figure shows the changes in three evaluation metrics: ACC, NMI, and PUR. The metrics are influenced by two hyperparameters $\lambda$ and $\tau$. $\lambda$ is the combination coefficient of two loss functions. $\tau$ denotes the temperature coefficient.