Table of Contents
Fetching ...

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

TL;DR

The paper addresses incomplete multi-view weak multi-label learning by introducing the Masked Two-channel Decoupling (MTD) framework, which splits each view into a shared representation and a view-private representation, and optimizes them with a cross-channel contrastive loss, random fragment masking, and a label-guided graph regularizer. It formulates a joint objective $\mathcal{L}_{all}$ comprising $\mathcal{L}_{mc}$, $\mathcal{L}_{gc}$, $\mathcal{L}_{ccc}$, and $\mathcal{L}_{re}$ to produce a robust embedding $\mathbf{Z}$ used for multi-label prediction. Key contributions include (1) the two-channel decoupling to balance consistency and complementarity across views, (2) the cross-channel contrastive loss to align shared features while preserving view-proprietary information, (3) the random fragment masking for vector data, and (4) a supervised graph regularization that preserves sample geometry in the embedding space. Empirical results on five datasets with 50% missing views and 50% missing labels demonstrate state-of-the-art performance, with ablations confirming the importance of each component. The approach offers a flexible, scalable solution for real-world incomplete multi-view multi-label tasks and lays groundwork for future work on multi-label correlations and data-imputation strategies.

Abstract

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

TL;DR

The paper addresses incomplete multi-view weak multi-label learning by introducing the Masked Two-channel Decoupling (MTD) framework, which splits each view into a shared representation and a view-private representation, and optimizes them with a cross-channel contrastive loss, random fragment masking, and a label-guided graph regularizer. It formulates a joint objective comprising , , , and to produce a robust embedding used for multi-label prediction. Key contributions include (1) the two-channel decoupling to balance consistency and complementarity across views, (2) the cross-channel contrastive loss to align shared features while preserving view-proprietary information, (3) the random fragment masking for vector data, and (4) a supervised graph regularization that preserves sample geometry in the embedding space. Empirical results on five datasets with 50% missing views and 50% missing labels demonstrate state-of-the-art performance, with ablations confirming the importance of each component. The approach offers a flexible, scalable solution for real-world incomplete multi-view multi-label tasks and lays groundwork for future work on multi-label correlations and data-imputation strategies.

Abstract

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.
Paper Structure (19 sections, 11 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 11 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Main architecture of our MTD. Masked data $\{\mathbf{X}'^{(v)}\}_{v=1}^{m}$ is real input of the MTD. And the cross-channel contrastive loss aims to enhance the semantic of 'shared-proprietary' channels.
  • Figure 2: Experimental results of nine methods on the five full databases without any missing views or labels. The center of the radar map shows the worst results and the vertexes mean the best results on the six metrics.
  • Figure 3: A random sample's channel similarity heat maps across all channels on Corel5k dataset with half of missing views and labels. S_1- S_6 and O_1- O_6 denote shared features and view-proprietary features on six views, respectively. With the increase of training epoch, the similarities of features on shared and proprietary channels show the expected trend, that is, the shared features across views gradually converged, while the similarities of "shared-proprietary" and "proprietary-proprietary" feature pairs gradually decreased.
  • Figure 4: AP value v.s. hyperparameters $\alpha$ and $\beta$ on the (a) Corel5k and (b) Pascal07 datasets; AP value v.s. hyperparameter $\gamma$ on the (c) Corel5k and Pascal07 datasets; AP value v.s. hyperparameter $\sigma$ on the (d) Corel5k and (e) Pascal07 datasets. The two datasets are with 50% missing-label rate, 50% missing-view rate, and 70% training samples.