Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning
Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu
TL;DR
The paper addresses incomplete multi-view weak multi-label learning by introducing the Masked Two-channel Decoupling (MTD) framework, which splits each view into a shared representation and a view-private representation, and optimizes them with a cross-channel contrastive loss, random fragment masking, and a label-guided graph regularizer. It formulates a joint objective $\mathcal{L}_{all}$ comprising $\mathcal{L}_{mc}$, $\mathcal{L}_{gc}$, $\mathcal{L}_{ccc}$, and $\mathcal{L}_{re}$ to produce a robust embedding $\mathbf{Z}$ used for multi-label prediction. Key contributions include (1) the two-channel decoupling to balance consistency and complementarity across views, (2) the cross-channel contrastive loss to align shared features while preserving view-proprietary information, (3) the random fragment masking for vector data, and (4) a supervised graph regularization that preserves sample geometry in the embedding space. Empirical results on five datasets with 50% missing views and 50% missing labels demonstrate state-of-the-art performance, with ablations confirming the importance of each component. The approach offers a flexible, scalable solution for real-world incomplete multi-view multi-label tasks and lays groundwork for future work on multi-label correlations and data-imputation strategies.
Abstract
Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.
