Table of Contents
Fetching ...

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Xin Gao, Jian Pu

TL;DR

The paper tackles incomplete multi-view representation learning by introducing MVP, which learns inter-view correspondences in a latent space through latent-variable permutations and two partitions (single-view and complete-view). It derives a valid ELBO by permuting and partitioning latent variables and strengthens cross-view consistency with an informational prior based on cyclic permutations, formalized as Permutation Divergence. The approach yields superior clustering and generation performance across seven datasets with varying missing rates, demonstrating robustness and improved inter-view sufficiency and consistency. This framework offers a scalable, principled way to infer missing views and fuse information across views, with broad applicability to multi-modal and multi-view data scenarios.

Abstract

Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavates invariant relationships between views in incomplete data. MVP establishes inter-view correspondences in the latent space of Variational Auto-Encoders, enabling the inference of missing views and the aggregation of more sufficient information. To derive a valid Evidence Lower Bound (ELBO) for learning, we apply permutations to randomly reorder variables for cross-view generation and then partition them by views to maintain invariant meanings under permutations. Additionally, we enhance consistency by introducing an informational prior with cyclic permutations of posteriors, which turns the regularization term into a similarity measure across distributions. We demonstrate the effectiveness of our approach on seven diverse datasets with varying missing ratios, achieving superior performance in multi-view clustering and generation tasks.

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

TL;DR

The paper tackles incomplete multi-view representation learning by introducing MVP, which learns inter-view correspondences in a latent space through latent-variable permutations and two partitions (single-view and complete-view). It derives a valid ELBO by permuting and partitioning latent variables and strengthens cross-view consistency with an informational prior based on cyclic permutations, formalized as Permutation Divergence. The approach yields superior clustering and generation performance across seven datasets with varying missing rates, demonstrating robustness and improved inter-view sufficiency and consistency. This framework offers a scalable, principled way to infer missing views and fuse information across views, with broad applicability to multi-modal and multi-view data scenarios.

Abstract

Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavates invariant relationships between views in incomplete data. MVP establishes inter-view correspondences in the latent space of Variational Auto-Encoders, enabling the inference of missing views and the aggregation of more sufficient information. To derive a valid Evidence Lower Bound (ELBO) for learning, we apply permutations to randomly reorder variables for cross-view generation and then partition them by views to maintain invariant meanings under permutations. Additionally, we enhance consistency by introducing an informational prior with cyclic permutations of posteriors, which turns the regularization term into a similarity measure across distributions. We demonstrate the effectiveness of our approach on seven diverse datasets with varying missing ratios, achieving superior performance in multi-view clustering and generation tasks.

Paper Structure

This paper contains 34 sections, 23 equations, 15 figures, 9 tables, 2 algorithms.

Figures (15)

  • Figure 1: Overview of our method: (a) Incomplete multi-view data $\mathbb{X}_1$ is fed into encoders to generate the diagonal elements of matrix $Z_0$, while off-diagonal elements are derived through inter-view correspondences. (b) Latent variables are partitioned by columns for single-view partition$\{\boldsymbol{\mathcal{S}_i}\}$ and by rows for complete-view partition$\{\boldsymbol{\mathcal{C}_i}\}$, with each row aggregated into a consensus variable ${\omega}$, capturing shared information across views. A cyclic permutation within each column transforms $Z_0$ into $Z_1$, generating new partitions (See Figure \ref{['fig:a1']} for transformation details.). Regularization is applied by comparing distributions at the same positions before ($Z_0$) and after ($Z_1$) permutation. (c) Each view $x^{(v)}$ is reconstructed from its latent representation ${z}^{(v)}$ and a consensus variable ${\omega}$.
  • Figure 2: Quantitative results on the PolyMNIST dataset compared to six MVAEs. Evaluations were conducted on all incomplete subsets of the testing set, averaged across same-sized subsets.
  • Figure 3: Multi-view sample generation conditioned on view 2. The leftmost column shows input images of view 2, randomly selected from digit classes 0 to 9. The following columns display multi-view samples (five views per sample) generated by various models. Ideally, the conditional generated digits should match the input digit, with yellow boxes highlighting inconsistencies. Accuracy scores, shown in parentheses, are derived from pre-trained classifiers on the generated images.
  • Figure 4: Multi-view samples generated by our method on the MVShapeNet dataset. Categories include table, chair, car, airplane, and rifle, with each sample consisting of five views from different angles. The model was trained with missing rate $\eta = 0.5$ and tested with only view 5 available.
  • Figure 5: An illustration of column-wise permutations to generate different complete-view partitions. In the first column (red box), the permutation $\sigma_1 = (1532)(4)$ is applied, and its inverse $(\sigma_1)^{-1} = (2351)(4)$ reverses the cycle order. This results in $\sigma_1([ z_1^{(1)}, z_2^{(1)}, z_3^{(1)}, z_4^{(1)}, z_5^{(1)} ]) = [ z_5^{(1)}, z_1^{(1)}, z_2^{(1)}, z_4^{(1)}, z_3^{(1)} ]$. The same procedure is applied to the other columns. Partitioning each row (purple box) yields the complete-view partition.
  • ...and 10 more figures

Theorems & Definitions (7)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof