Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs
Xin Gao, Jian Pu
TL;DR
The paper tackles incomplete multi-view representation learning by introducing MVP, which learns inter-view correspondences in a latent space through latent-variable permutations and two partitions (single-view and complete-view). It derives a valid ELBO by permuting and partitioning latent variables and strengthens cross-view consistency with an informational prior based on cyclic permutations, formalized as Permutation Divergence. The approach yields superior clustering and generation performance across seven datasets with varying missing rates, demonstrating robustness and improved inter-view sufficiency and consistency. This framework offers a scalable, principled way to infer missing views and fuse information across views, with broad applicability to multi-modal and multi-view data scenarios.
Abstract
Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavates invariant relationships between views in incomplete data. MVP establishes inter-view correspondences in the latent space of Variational Auto-Encoders, enabling the inference of missing views and the aggregation of more sufficient information. To derive a valid Evidence Lower Bound (ELBO) for learning, we apply permutations to randomly reorder variables for cross-view generation and then partition them by views to maintain invariant meanings under permutations. Additionally, we enhance consistency by introducing an informational prior with cyclic permutations of posteriors, which turns the regularization term into a similarity measure across distributions. We demonstrate the effectiveness of our approach on seven diverse datasets with varying missing ratios, achieving superior performance in multi-view clustering and generation tasks.
