Table of Contents
Fetching ...

Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification

Wulin Xie, Lian Zhao, Jiang Long, Xiaohuan Lu, Bingyan Nie

TL;DR

This work tackles incomplete multi-view multi-label classification by presenting MVFD, a two-stage framework that factorizes multi-view representations into view-consistent and view-specific factors. The consistent representation is learned with a masked cross-view prediction strategy and an information-theoretic objective to preserve task-relevant information, while a graph disentangling loss reduces redundancy between shared and specific factors. Stage 1 optimizes $\ell_{ce1}$, $\ell_{cp}$, and $\ell_{sc}$ to obtain $\hat{\mathbf{C}}$, and Stage 2 optimizes $\ell_{ce2}$, $\ell_{rec}$, and $\ell_{gd}$ to produce a discriminative feature $\mathbf{Z}$ via $\mathbf{Z}=\mathrm{Sigmoid}(\bar{\mathbf{S}})\odot\bar{\mathbf{C}}$. Experiments on five datasets with missing views/labels show MVFD consistently outperforms state-of-the-art methods, demonstrating improved robustness and generalization in incomplete multi-view multi-label settings, with ablations confirming the contribution of each component and the two-stage design.

Abstract

Multi-view multi-label classification (MvMLC) has recently garnered significant research attention due to its wide range of real-world applications. However, incompleteness in views and labels is a common challenge, often resulting from data collection oversights and uncertainties in manual annotation. Furthermore, the task of learning robust multi-view representations that are both view-consistent and view-specific from diverse views still a challenge problem in MvMLC. To address these issues, we propose a novel framework for incomplete multi-view multi-label classification (iMvMLC). Our method factorizes multi-view representations into two independent sets of factors: view-consistent and view-specific, and we correspondingly design a graph disentangling loss to fully reduce redundancy between these representations. Additionally, our framework innovatively decomposes consistent representation learning into three key sub-objectives: (i) how to extract view-shared information across different views, (ii) how to eliminate intra-view redundancy in consistent representations, and (iii) how to preserve task-relevant information. To this end, we design a robust task-relevant consistency learning module that collaboratively learns high-quality consistent representations, leveraging a masked cross-view prediction (MCP) strategy and information theory. Notably, all modules in our framework are developed to function effectively under conditions of incomplete views and labels, making our method adaptable to various multi-view and multi-label datasets. Extensive experiments on five datasets demonstrate that our method outperforms other leading approaches.

Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification

TL;DR

This work tackles incomplete multi-view multi-label classification by presenting MVFD, a two-stage framework that factorizes multi-view representations into view-consistent and view-specific factors. The consistent representation is learned with a masked cross-view prediction strategy and an information-theoretic objective to preserve task-relevant information, while a graph disentangling loss reduces redundancy between shared and specific factors. Stage 1 optimizes , , and to obtain , and Stage 2 optimizes , , and to produce a discriminative feature via . Experiments on five datasets with missing views/labels show MVFD consistently outperforms state-of-the-art methods, demonstrating improved robustness and generalization in incomplete multi-view multi-label settings, with ablations confirming the contribution of each component and the two-stage design.

Abstract

Multi-view multi-label classification (MvMLC) has recently garnered significant research attention due to its wide range of real-world applications. However, incompleteness in views and labels is a common challenge, often resulting from data collection oversights and uncertainties in manual annotation. Furthermore, the task of learning robust multi-view representations that are both view-consistent and view-specific from diverse views still a challenge problem in MvMLC. To address these issues, we propose a novel framework for incomplete multi-view multi-label classification (iMvMLC). Our method factorizes multi-view representations into two independent sets of factors: view-consistent and view-specific, and we correspondingly design a graph disentangling loss to fully reduce redundancy between these representations. Additionally, our framework innovatively decomposes consistent representation learning into three key sub-objectives: (i) how to extract view-shared information across different views, (ii) how to eliminate intra-view redundancy in consistent representations, and (iii) how to preserve task-relevant information. To this end, we design a robust task-relevant consistency learning module that collaboratively learns high-quality consistent representations, leveraging a masked cross-view prediction (MCP) strategy and information theory. Notably, all modules in our framework are developed to function effectively under conditions of incomplete views and labels, making our method adaptable to various multi-view and multi-label datasets. Extensive experiments on five datasets demonstrate that our method outperforms other leading approaches.
Paper Structure (16 sections, 1 theorem, 14 equations, 6 figures, 2 tables)

This paper contains 16 sections, 1 theorem, 14 equations, 6 figures, 2 tables.

Key Result

Proposition 1

A multi-view consistent representation $c$ contains the shared information across different views if each observation, i.e., $x^{(v)}$ from ${x^{(1} ,..., x^{(m)} }$,can be reconstructed from a mapping $f^{(v)}\left ( \cdot \right )$, i.e., $x^{(v)} = f^{(v)}\left ( c \right ) .$

Figures (6)

  • Figure 1: An overview of our MVFD framework. In the first age, we first randomly masked fragments of input features. Then we factorize consistent representation learning into three sub-goals for collaborative optimization. In the second stage, we freeze consistent encoders trained in the first stage and leverage learned consistent representation and our graph disentangling loss to guide the disentanglement process of view-specific information.
  • Figure 2: We define a complete multi-view data contain view-specific information, view-shared information, and redundant information between. The goal of our graph disentangling loss is fully eliminate redundancy between them and obtain disentangled view-specific information.
  • Figure 3: Experimental results of eleven methods on three datasets without any missing views or labels. The worst results are indicated at the center of the radar chart while the best results are represented by the vertexes on the six metrics.
  • Figure 4: Performance about the comparison with SOTA methods with different missing rates on View or Label
  • Figure 5: The AP values for hyper-parameters $\alpha$ and $\beta$ on the Corel5k (Fig5. a) and Pascal07 (Fig5. b) datasets; AP values for hyper-parameters $\alpha$ and $\beta$ on the Corel5k (Fig5. c) and Pascal07 (Fig5. d) datasets are presented. Both datasets contain 50% available views and labels, with a 70% training sample rate.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Proposition 1