Table of Contents
Fetching ...

Self Supervised Correlation-based Permutations for Multi-View Clustering

Ran Eisenberg, Jonathan Svirsky, Ofir Lindenbaum

TL;DR

The paper tackles end-to-end multi-view clustering for general data types by learning fused representations through a permutation-based canonical correlation objective, enabling clustering without a separate representation-learning stage. COPER jointly optimizes deep Canonically Correlated Encoders with a self-supervised multi-view pseudo-labeling and within-cluster permutation scheme, yielding representations that approximate the projection of supervised Linear Discriminant Analysis (LDA) under mild assumptions. Theoretical results establish an LDA approximation and bound the eigenvalue error due to pseudo-label noise; empirically COPER outperforms state-of-the-art deep MVC models on ten datasets and scales to large data. The approach is versatile across image and tabular data, and its permutation-based augmentation provides a general, potentially more effective alternative to standard CCA-based MVC.

Abstract

Combining data from different sources can improve data analysis tasks such as clustering. However, most of the current multi-view clustering methods are limited to specific domains or rely on a suboptimal and computationally intensive two-stage process of representation learning and clustering. We propose an end-to-end deep learning-based multi-view clustering framework for general data types (such as images and tables). Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective. We provide a theoretical analysis showing how the learned embeddings approximate those obtained by supervised linear discriminant analysis (LDA). Cluster assignments are learned by identifying consistent pseudo-labels across multiple views. Additionally, we establish a theoretical bound on the error caused by incorrect pseudo-labels in the unsupervised representations compared to LDA. Extensive experiments on ten multi-view clustering benchmark datasets provide empirical evidence for the effectiveness of the proposed model.

Self Supervised Correlation-based Permutations for Multi-View Clustering

TL;DR

The paper tackles end-to-end multi-view clustering for general data types by learning fused representations through a permutation-based canonical correlation objective, enabling clustering without a separate representation-learning stage. COPER jointly optimizes deep Canonically Correlated Encoders with a self-supervised multi-view pseudo-labeling and within-cluster permutation scheme, yielding representations that approximate the projection of supervised Linear Discriminant Analysis (LDA) under mild assumptions. Theoretical results establish an LDA approximation and bound the eigenvalue error due to pseudo-label noise; empirically COPER outperforms state-of-the-art deep MVC models on ten datasets and scales to large data. The approach is versatile across image and tabular data, and its permutation-based augmentation provides a general, potentially more effective alternative to standard CCA-based MVC.

Abstract

Combining data from different sources can improve data analysis tasks such as clustering. However, most of the current multi-view clustering methods are limited to specific domains or rely on a suboptimal and computationally intensive two-stage process of representation learning and clustering. We propose an end-to-end deep learning-based multi-view clustering framework for general data types (such as images and tables). Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective. We provide a theoretical analysis showing how the learned embeddings approximate those obtained by supervised linear discriminant analysis (LDA). Cluster assignments are learned by identifying consistent pseudo-labels across multiple views. Additionally, we establish a theoretical bound on the error caused by incorrect pseudo-labels in the unsupervised representations compared to LDA. Extensive experiments on ten multi-view clustering benchmark datasets provide empirical evidence for the effectiveness of the proposed model.
Paper Structure (51 sections, 2 theorems, 26 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 51 sections, 2 theorems, 26 equations, 7 figures, 12 tables, 1 algorithm.

Key Result

Proposition 4.3

The embedding learned through the CCA objective using within-cluster permutation for $v$ and $w$ converges to the same representation extracted when applying the LDA objective from eq:lda to ${\hbox{\boldmath $\theta$}}$.

Figures (7)

  • Figure 1: Our proposed deep learning model, COrrelation-based PERmutations (COPER). Two modalities $v$ and $w$, are processed through view-specific encoders, creating latent embeddings (${\hbox{\boldmath $H$}}^{(v)}$ and ${\hbox{\boldmath $H$}}^{(w)}$). The embeddings are learned using a correlation maximization loss and then fused to serve as input for the clustering head. The clustering head estimates a probability matrix, denoted as $P$, which is used to derive multi-view pseudo-labels. We then generate within-cluster permutations based on these pseudo-labels. The permuted samples are used to update the CCA representation. By changing the pairing of observations fed into the CCA objective, we enhance cluster separation and extract embeddings that theoretically approximate the solution of supervised Linear Discriminant Analysis (LDA), as demonstrated in Section \ref{['sec:method_relation_to_lda']}.
  • Figure 2: Illustration of how within-cluster permutations can enhance the embeddings learned by CCA. We use a binary subset of FashionMNIST and split the images to create the two views. Next, we embed the data by applying CCA to both views, ${\hbox{\boldmath $X$}}^{(v)}$ and ${\hbox{\boldmath $X$}}^{(w)}$. As described in Subsection \ref{['sec:pseudolabeling']}, the embeddings are used to extract multi-view pseudo-labels, then within-cluster permutations $\Pi^1$ are used to create new corresponding pairs of samples $\Tilde{\mathcal{X}}^{(v)}$ and $\Tilde{\mathcal{X}}^{(w)}$. They are then used as augmentations to perform a second CCA (middle pair of images). This process is repeated with $\Pi^2$ (right-most pair of images). As shown by this example, using within-cluster permutations enhance the representations learned by CCA, improving clustering performance from an adjusted Rand index (ARI) of $0.598$ to $0.872$.
  • Figure 3: (i) Case study of permutation CCA using Fashion MNIST. (a) Permuting more samples within a cluster improves cluster separation as measured by the Adjusted Rand Index (ARI). We compare labeled permutation (supervised) to pseudo-label-based (unsupervised) and random. (b) Permuting more samples also pushes the representation obtained by CCA towards LDA, as indicated by the gap between eigenvalues. (ii) An experiment on LDA approximation with induced label noise. We perform LDA on different subsets of F-MNIST. We gradually increased the number of samples starting from 20% and analyzed the effect on the resulting eigenvalues $\hat{\lambda}_i$, compared to the eigenvalues obtained from LDA on the entire dataset $\lambda_i$. As expected adding more correctly annotated samples reduces the eigenvalue gap, while noisy annotations increase it.
  • Figure 4: A high-level illustration of our pseudo-labeling scheme for a single view $w$. Appendix \ref{['sec:ap_pseudolabeling_desc']} provides a complimentary Figure (\ref{['fig:ap_pseudolabeling']}) for the entire process, with corresponding samples in view $v$ and additional details.
  • Figure 5: Illustration of our pseudo-labeling scheme.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 4.1
  • Proposition 4.3
  • Lemma G.1