Table of Contents
Fetching ...

Hierarchical Consensus Network for Multiview Feature Learning

Chengwei Xia, Chaoxi Niu, Kun Zhan

TL;DR

HCN addresses the challenge of learning cross-view-consistent representations in multiview data by introducing hierarchical consensus across views with three indices: classifying, coding, and global consensus. Each index corresponds to a distinct alignment level—column-wise (CCA-like) class distributions, row-wise (contrastive-like) instance coding, and matrix-level global alignment—integrated via a view-specific autoencoder and data augmentation. The method achieves state-of-the-art clustering performance on four datasets and demonstrates robustness to hyperparameters and augmentation. This work provides a principled bridge between CCA and contrastive learning for multiview representation learning and offers a scalable, augmentation-friendly framework.

Abstract

Multiview feature learning aims to learn discriminative features by integrating the distinct information in each view. However, most existing methods still face significant challenges in learning view-consistency features, which are crucial for effective multiview learning. Motivated by the theories of CCA and contrastive learning in multiview feature learning, we propose the hierarchical consensus network (HCN) in this paper. The HCN derives three consensus indices for capturing the hierarchical consensus across views, which are classifying consensus, coding consensus, and global consensus, respectively. Specifically, classifying consensus reinforces class-level correspondence between views from a CCA perspective, while coding consensus closely resembles contrastive learning and reflects contrastive comparison of individual instances. Global consensus aims to extract consensus information from two perspectives simultaneously. By enforcing the hierarchical consensus, the information within each view is better integrated to obtain more comprehensive and discriminative features. The extensive experimental results obtained on four multiview datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods.

Hierarchical Consensus Network for Multiview Feature Learning

TL;DR

HCN addresses the challenge of learning cross-view-consistent representations in multiview data by introducing hierarchical consensus across views with three indices: classifying, coding, and global consensus. Each index corresponds to a distinct alignment level—column-wise (CCA-like) class distributions, row-wise (contrastive-like) instance coding, and matrix-level global alignment—integrated via a view-specific autoencoder and data augmentation. The method achieves state-of-the-art clustering performance on four datasets and demonstrates robustness to hyperparameters and augmentation. This work provides a principled bridge between CCA and contrastive learning for multiview representation learning and offers a scalable, augmentation-friendly framework.

Abstract

Multiview feature learning aims to learn discriminative features by integrating the distinct information in each view. However, most existing methods still face significant challenges in learning view-consistency features, which are crucial for effective multiview learning. Motivated by the theories of CCA and contrastive learning in multiview feature learning, we propose the hierarchical consensus network (HCN) in this paper. The HCN derives three consensus indices for capturing the hierarchical consensus across views, which are classifying consensus, coding consensus, and global consensus, respectively. Specifically, classifying consensus reinforces class-level correspondence between views from a CCA perspective, while coding consensus closely resembles contrastive learning and reflects contrastive comparison of individual instances. Global consensus aims to extract consensus information from two perspectives simultaneously. By enforcing the hierarchical consensus, the information within each view is better integrated to obtain more comprehensive and discriminative features. The extensive experimental results obtained on four multiview datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods.

Paper Structure

This paper contains 27 sections, 2 theorems, 23 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Coding-consensus learning is equivalent to contrastive learning with positive pairs.

Figures (8)

  • Figure 1: The hierarchical consensus: Consider a scenario with five students, where we collect their height and weight. We employ two simple classifiers: one based on height (assigning 'boy' if height $>$1.7m and 'girl'; otherwise) and another based on weight (assigning 'boy' if weight $>$60kg and 'girl' otherwise). Each view produces binary predictions. Our hierarchical consensus objective involves deriving consensus indices through inner product computations. (1) Classifying Consensus: In the first column of the matrix, $\bm{b}^{(h)}$ and $\bm{b}^{(w)}$ denote boys' predictions from the two views. The result of 2 counts the number of boys. The objective of classifying consensus is to align predictions for gender quantity. (2) Coding Consensus: Moving to the first row of the second matrix, $\bm{s}^{(h)}$ and $\bm{s}^{(w)}$ represent the gender coding for a same student. The result of 1 indicates consensus prediction of the student. The objective of coding consensus is to align gender coding of the student. (3) Global Consensus: To the whole matrix perspective, a global consensus is established.
  • Figure 2: The HCN framework. Each view contains a view-specific autoencoder, i.e., an encoder $f(\cdot|\theta^{(v)})$ and decoder $g(\cdot|\phi^{(v)})$. The representations $Z^{(v)}$ and $Z^{(v)}_{\text{aug}}$ are learned by minimizing the construction error $\mathcal{L}_{\text{Rec}}$. Besides, $Z^{(v)}$ and $Z^{(v)}_{\text{aug}}$ are fed into the softmax to obtain the class posterior probabilities, i.e., $Y^{(v)}$ and $Y^{(v)}_{\text{aug}}$. Given the representations and class probabilities of the two views, HCN aims to capture hierarchical consensus between them. Specifically, classifying consensus ensures the consistency of class distributions across views, coding consensus enhances the same prediction for the same sample, and global consensus minimizes the difference between learned representations from different views.
  • Figure 3: Convergence and clustering performance of HCN with increasing epoch on LandUse-21 and Noisy MNIST.
  • Figure 4: Parameter sensitivity of drop rate $\rho$.
  • Figure 5: Visualizations on Noisy MNIST with baselines.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof