Table of Contents
Fetching ...

Deep Multiview Clustering by Contrasting Cluster Assignments

Jie Chen, Hua Mao, Wai Lok Woo, Xi Peng

TL;DR

This paper tackles multiview clustering by learning view-invariant representations through cross-view contrastive learning (CVCL). It introduces a two-module network with per-view autoencoders and a cluster-level CVCL objective that aligns soft cluster assignments across views, supplemented by a pretraining/fine-tuning scheme and theoretical analysis of alignment and complexity. Empirically, CVCL achieves state-of-the-art clustering performance across seven datasets, with ablations showing the importance of pretraining and the cross-view consistency term $L_a$ for robust cross-view alignment. The approach holds practical impact for unsupervised clustering in multi-source data, offering improved discriminability and scalable performance without label information.

Abstract

Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. Specifically, we first employ deep autoencoders to extract view-dependent features in the pretraining stage. Then, a cluster-level CVCL strategy is presented to explore consistent semantic label information among the multiple views in the fine-tuning stage. Thus, the proposed CVCL method is able to produce more discriminative cluster assignments by virtue of this learning strategy. Moreover, we provide a theoretical analysis of soft cluster assignment alignment. Extensive experimental results obtained on several datasets demonstrate that the proposed CVCL method outperforms several state-of-the-art approaches.

Deep Multiview Clustering by Contrasting Cluster Assignments

TL;DR

This paper tackles multiview clustering by learning view-invariant representations through cross-view contrastive learning (CVCL). It introduces a two-module network with per-view autoencoders and a cluster-level CVCL objective that aligns soft cluster assignments across views, supplemented by a pretraining/fine-tuning scheme and theoretical analysis of alignment and complexity. Empirically, CVCL achieves state-of-the-art clustering performance across seven datasets, with ablations showing the importance of pretraining and the cross-view consistency term for robust cross-view alignment. The approach holds practical impact for unsupervised clustering in multi-source data, offering improved discriminability and scalable performance without label information.

Abstract

Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most existing deep MVC methods, exploring the invariant representations of multiple views is still an intractable problem. In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. Specifically, we first employ deep autoencoders to extract view-dependent features in the pretraining stage. Then, a cluster-level CVCL strategy is presented to explore consistent semantic label information among the multiple views in the fine-tuning stage. Thus, the proposed CVCL method is able to produce more discriminative cluster assignments by virtue of this learning strategy. Moreover, we provide a theoretical analysis of soft cluster assignment alignment. Extensive experimental results obtained on several datasets demonstrate that the proposed CVCL method outperforms several state-of-the-art approaches.
Paper Structure (26 sections, 18 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 18 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The framework of CVCL. Each view contains two modules, including a view-specific autoencoder module and a CVCL module. The multilayer perceptron (MLP) consists of multiple linear layers. The view-specific autoencoder module contains the encoding part and the decoding part, i.e., $\left\{ {f_e^{(v)}\left( {\mathbf{x}_i^{(v)},\mathbf{W}_e^{(v)}} \right)} \right\}_{v = 1}^{{n_v}}$ and $\left\{ {f_d^{(v)}\left( {\mathbf{z}_i^{(v)},\mathbf{W}_d^{(v)}} \right)} \right\}_{v = 1}^{{n_v}}$, respectively. The CVCL module is employed to explore consistent semantic label information by contrasting the cluster assignments among multiple views.
  • Figure 2: The ACC values yielded by the CVCL method with different $\alpha$ and $\beta$ combinations on the four representative datasets.
  • Figure 3: The NMI values yielded by the CVCL method with different $\alpha$ and $\beta$ combinations on the two representative datasets.
  • Figure 4: Convergence results obtained by the CVCL method on all the datasets.