THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering
Jian Zhu
TL;DR
This work tackles untrustworthy fusion in Multi-View Clustering by introducing THCRL, a framework with Deep Symmetry Hierarchical Fusion (DSHF) and Average $K$-Nearest Neighbors Contrastive Learning (AKCL). DSHF uses a UNet-based, attention-empowered architecture to denoise and fuse multi-view features, while AKCL aligns fused representations with cluster-consistent samples rather than only cross-view instances. The approach yields state-of-the-art results across six public MVC datasets, with comprehensive ablation and visualization confirming the contributions of both modules. The method demonstrates strong generalization and robustness to hyperparameters, suggesting practical applicability for reliable multi-view analysis in diverse domains.
Abstract
Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However, a significant challenge remains: the problem of untrustworthy fusion. This problem primarily arises from two key factors: 1) Existing methods often ignore the presence of inherent noise within individual views; 2) In traditional MVC methods using Contrastive Learning (CL), similarity computations typically rely on different views of the same instance, while neglecting the structural information from nearest neighbors within the same cluster. Consequently, this leads to the wrong direction for multi-view fusion. To address this problem, we present a novel Trusted Hierarchical Contrastive Representation Learning (THCRL). It consists of two key modules. Specifically, we propose the Deep Symmetry Hierarchical Fusion (DSHF) module, which leverages the UNet architecture integrated with multiple denoising mechanisms to achieve trustworthy fusion of multi-view data. Furthermore, we present the Average K-Nearest Neighbors Contrastive Learning (AKCL) module to align the fused representation with the view-specific representation. Unlike conventional strategies, AKCL enhances representation similarity among samples belonging to the same cluster, rather than merely focusing on the same sample across views, thereby reinforcing the confidence of the fused representation. Extensive experiments demonstrate that THCRL achieves the state-of-the-art performance in deep MVC tasks.
