Table of Contents
Fetching ...

Incomplete Multi-view Multi-label Classification via a Dual-level Contrastive Learning Framework

Bingyan Nie, Wulin Xie, Jiang Long, Xiaohuan Lu

TL;DR

The paper tackles incomplete multi-view multi-label classification by proposing a dual-level contrastive learning (DCL) framework that decouples each view into a shared representation $S^{(m)}$ and a view-private representation $P^{(m)}$. Two contrastive losses are applied: instance-level contrastive learning on high-level features to maximize cross-view consensus, and label-level contrastive learning to exploit label correlations across views, with missing data handled via indicator matrices and masked inputs. The final fused feature $Z = \theta(\bar{P}) \cdot \bar{S}$ feeds a classifier, and the overall objective combines $L_c$, $L_s$, $L_l$, and $L_r$ to balance prediction, consistency, and reconstruction. Empirically, DCL achieves stable, superior performance across five benchmark datasets under challenging missing-data settings, with ablations showing the critical contributions of both decoupling and the two-level contrastive losses.

Abstract

Recently, multi-view and multi-label classification have become significant domains for comprehensive data analysis and exploration. However, incompleteness both in views and labels is still a real-world scenario for multi-view multi-label classification. In this paper, we seek to focus on double missing multi-view multi-label classification tasks and propose our dual-level contrastive learning framework to solve this issue. Different from the existing works, which couple consistent information and view-specific information in the same feature space, we decouple the two heterogeneous properties into different spaces and employ contrastive learning theory to fully disentangle the two properties. Specifically, our method first introduces a two-channel decoupling module that contains a shared representation and a view-proprietary representation to effectively extract consistency and complementarity information across all views. Second, to efficiently filter out high-quality consistent information from multi-view representations, two consistency objectives based on contrastive learning are conducted on the high-level features and the semantic labels, respectively. Extensive experiments on several widely used benchmark datasets demonstrate that the proposed method has more stable and superior classification performance.

Incomplete Multi-view Multi-label Classification via a Dual-level Contrastive Learning Framework

TL;DR

The paper tackles incomplete multi-view multi-label classification by proposing a dual-level contrastive learning (DCL) framework that decouples each view into a shared representation and a view-private representation . Two contrastive losses are applied: instance-level contrastive learning on high-level features to maximize cross-view consensus, and label-level contrastive learning to exploit label correlations across views, with missing data handled via indicator matrices and masked inputs. The final fused feature feeds a classifier, and the overall objective combines , , , and to balance prediction, consistency, and reconstruction. Empirically, DCL achieves stable, superior performance across five benchmark datasets under challenging missing-data settings, with ablations showing the critical contributions of both decoupling and the two-level contrastive losses.

Abstract

Recently, multi-view and multi-label classification have become significant domains for comprehensive data analysis and exploration. However, incompleteness both in views and labels is still a real-world scenario for multi-view multi-label classification. In this paper, we seek to focus on double missing multi-view multi-label classification tasks and propose our dual-level contrastive learning framework to solve this issue. Different from the existing works, which couple consistent information and view-specific information in the same feature space, we decouple the two heterogeneous properties into different spaces and employ contrastive learning theory to fully disentangle the two properties. Specifically, our method first introduces a two-channel decoupling module that contains a shared representation and a view-proprietary representation to effectively extract consistency and complementarity information across all views. Second, to efficiently filter out high-quality consistent information from multi-view representations, two consistency objectives based on contrastive learning are conducted on the high-level features and the semantic labels, respectively. Extensive experiments on several widely used benchmark datasets demonstrate that the proposed method has more stable and superior classification performance.

Paper Structure

This paper contains 15 sections, 13 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The main framework of our DCL. The private and consistent features are extracted by S-P encoders, respectively. Adopting an interaction approach, $Z$ represents the final fused fusion.
  • Figure 2: The performance of different methods on various datasets with full views, full labels and $70\%$ training samples.
  • Figure 3: Experimental results on the Corel5k dataset: (a) different missing-view ratios and a 50% missing-label ratio and (b) different missing-label ratios and a 50% missing-view ratio.
  • Figure 4: The AP values for hyper-parameters $\alpha$ and $\beta$ on the Corel5k (Fig. 3a) and Pascal07 (Fig. 3b) datasets; AP values for $\gamma$ on both Corel5k and Pascal07 datasets (Fig. 3c). Both datasets contain 50% available views and labels, with a 70% training sample rate.
  • Figure 5: The following figures present a channel similarity heatmap of the average feature of all samples in different channels on the Corel5k dataset, where half of the views and labels are absent. The six groups of features, labeled $S_1-S_6$ and $P_1-P_6$, represent the shared features and view-private features on the six views, respectively. As the number of training epochs increases, the similarity of shared features between views gradually increases, while the similarity between shared features and private features gradually decreases.