Table of Contents
Fetching ...

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jieping Ye

TL;DR

URRL-IMVC tackles incomplete multi-view clustering by learning a unified embedding through an attention-based auto-encoder that fuses multi-view and neighborhood information. It avoids explicit missing-view recovery by employing KNN imputation and data augmentation to bolster robustness, complemented by a DEC-based clustering module and a Transformer-based encoder. The framework introduces CDPE, TAM, and a shallow decoder to regularize representations, and uses a two-stage training regime with three losses to ensure clustering-friendly embeddings. Across six benchmark datasets, URRL-IMVC achieves state-of-the-art performance and demonstrates stability across varying numbers of views and missing rates, offering a scalable solution for IMVC with practical impact in real-world, incomplete-data scenarios.

Abstract

Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusing only on consensus between views or provide unreliable recovered views due to the absence of supervision. To address these limitations, we propose a novel Unified and Robust Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC). URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples. Firstly, to overcome the limitations of cross-view contrastive learning, URRL-IMVC incorporates an attention-based auto-encoder framework to fuse multi-view information and generate unified embeddings. Secondly, URRL-IMVC directly enhances the robustness of the unified embedding against view-missing conditions through KNN imputation and data augmentation techniques, eliminating the need for explicit missing view recovery. Finally, incremental improvements are introduced to further enhance the overall performance, such as the Clustering Module and the customization of the Encoder. We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance. Furthermore, comprehensive ablation studies are performed to validate the effectiveness of our design.

URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

TL;DR

URRL-IMVC tackles incomplete multi-view clustering by learning a unified embedding through an attention-based auto-encoder that fuses multi-view and neighborhood information. It avoids explicit missing-view recovery by employing KNN imputation and data augmentation to bolster robustness, complemented by a DEC-based clustering module and a Transformer-based encoder. The framework introduces CDPE, TAM, and a shallow decoder to regularize representations, and uses a two-stage training regime with three losses to ensure clustering-friendly embeddings. Across six benchmark datasets, URRL-IMVC achieves state-of-the-art performance and demonstrates stability across varying numbers of views and missing rates, offering a scalable solution for IMVC with practical impact in real-world, incomplete-data scenarios.

Abstract

Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusing only on consensus between views or provide unreliable recovered views due to the absence of supervision. To address these limitations, we propose a novel Unified and Robust Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC). URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples. Firstly, to overcome the limitations of cross-view contrastive learning, URRL-IMVC incorporates an attention-based auto-encoder framework to fuse multi-view information and generate unified embeddings. Secondly, URRL-IMVC directly enhances the robustness of the unified embedding against view-missing conditions through KNN imputation and data augmentation techniques, eliminating the need for explicit missing view recovery. Finally, incremental improvements are introduced to further enhance the overall performance, such as the Clustering Module and the customization of the Encoder. We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance. Furthermore, comprehensive ablation studies are performed to validate the effectiveness of our design.
Paper Structure (37 sections, 23 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 37 sections, 23 equations, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: A comparison between our learning framework and commonly used cross-view contrastive learning and missing view recovery framework. The key difference lies in how the unified embedding for clustering is obtained. Our design (\ref{['fig: pipeline 3']}) directly fuses multi-view information and utilizes KNN imputation and data augmentation to obtain unified and robust embedding under view-missing conditions, avoiding the drawbacks of (\ref{['fig: pipeline 1']}) and (\ref{['fig: pipeline 2']}).
  • Figure 2: The overall architecture of URRL-IMVC. During training, the input data is augmented to simulate view-missing conditions, and KNN Imputation provides hints for missing views, forming an input batch with both neighbor and view dimensions. This batch is fed into the auto-encoder network, consisting of the Encoder (including the Neighbor Dimensional Encoder and View Dimensional Encoder), the Decoder, and the Clustering Module. The Encoders fuse information from the neighbor and view dimensions to generate a unified embedding. The Decoder reconstructs the augmented input, and the Clustering Module produces clustering results. Additionally, an un-augmented embedding is obtained by passing the original input data through the shared Encoders. Three loss functions, including Reconstruction loss, Robustness loss, and Clustering loss, enhance robustness against view-missing conditions and encourage learning clustering-friendly embeddings.
  • Figure 3: An intuitive visualization of the output choice of the Neighbor Dimensional Encoder (NDE) and View Dimensional Encoder (VDE). In NDE, the first vector of the output sequence is chosen to provide a bias on the most reliable input. In VDE, the outputs are averaged to provide an unbiased representation of all views.
  • Figure 4: Comparison with state-of-the-art approaches under different missing conditions on the Caltech101-7 dataset. The performance of each approach is reported using fold lines.
  • Figure 5: T-SNE visualization of the embeddings during the training process on the Caltech101-7 dataset. The iteration number and corresponding accuracy are recorded below each sub-figure. The training process consists of 4400 iterations, with the Clustering Module initialized at 2200 iterations.