Table of Contents
Fetching ...

Cross-view Joint Learning for Mixed-Missing Multi-view Unsupervised Feature Selection

Zongxin Shen, Yanyong Huang, Dongjie Wang, Jinyuan Chang, Fengmao Lv, Tianrui Li, Xiaoyi Jiang

TL;DR

This work tackles mixed-missing multi-view unsupervised feature selection by proposing CLIM-FS, a joint learning framework that integrates missing-view and missing-variable imputation with feature selection through a nonnegative orthogonal matrix factorization model. It leverages cross-view consensus via a shared cluster indicator $\bm{F}^{*}$ and cross-view local geometry via graphs $\bm{S}^{v}$ and $\bm{H}$, jointly guiding imputation and discriminative feature selection. The authors provide theoretical results showing the imputation preserves intra/inter-cluster structure and cross-view geometry, along with convergence guarantees for the optimization algorithm, and demonstrate superior performance on eight real-world datasets against strong baselines. Overall, CLIM-FS advances MUFS toward realistic mixed-missing settings and offers a principled, scalable approach for robust feature selection in heterogeneous multi-view data.

Abstract

Incomplete multi-view unsupervised feature selection (IMUFS), which aims to identify representative features from unlabeled multi-view data containing missing values, has received growing attention in recent years. Despite their promising performance, existing methods face three key challenges: 1) by focusing solely on the view-missing problem, they are not well-suited to the more prevalent mixed-missing scenario in practice, where some samples lack entire views or only partial features within views; 2) insufficient utilization of consistency and diversity across views limits the effectiveness of feature selection; and 3) the lack of theoretical analysis makes it unclear how feature selection and data imputation interact during the joint learning process. Being aware of these, we propose CLIM-FS, a novel IMUFS method designed to address the mixed-missing problem. Specifically, we integrate the imputation of both missing views and variables into a feature selection model based on nonnegative orthogonal matrix factorization, enabling the joint learning of feature selection and adaptive data imputation. Furthermore, we fully leverage consensus cluster structure and cross-view local geometrical structure to enhance the synergistic learning process. We also provide a theoretical analysis to clarify the underlying collaborative mechanism of CLIM-FS. Experimental results on eight real-world multi-view datasets demonstrate that CLIM-FS outperforms state-of-the-art methods.

Cross-view Joint Learning for Mixed-Missing Multi-view Unsupervised Feature Selection

TL;DR

This work tackles mixed-missing multi-view unsupervised feature selection by proposing CLIM-FS, a joint learning framework that integrates missing-view and missing-variable imputation with feature selection through a nonnegative orthogonal matrix factorization model. It leverages cross-view consensus via a shared cluster indicator and cross-view local geometry via graphs and , jointly guiding imputation and discriminative feature selection. The authors provide theoretical results showing the imputation preserves intra/inter-cluster structure and cross-view geometry, along with convergence guarantees for the optimization algorithm, and demonstrate superior performance on eight real-world datasets against strong baselines. Overall, CLIM-FS advances MUFS toward realistic mixed-missing settings and offers a principled, scalable approach for robust feature selection in heterogeneous multi-view data.

Abstract

Incomplete multi-view unsupervised feature selection (IMUFS), which aims to identify representative features from unlabeled multi-view data containing missing values, has received growing attention in recent years. Despite their promising performance, existing methods face three key challenges: 1) by focusing solely on the view-missing problem, they are not well-suited to the more prevalent mixed-missing scenario in practice, where some samples lack entire views or only partial features within views; 2) insufficient utilization of consistency and diversity across views limits the effectiveness of feature selection; and 3) the lack of theoretical analysis makes it unclear how feature selection and data imputation interact during the joint learning process. Being aware of these, we propose CLIM-FS, a novel IMUFS method designed to address the mixed-missing problem. Specifically, we integrate the imputation of both missing views and variables into a feature selection model based on nonnegative orthogonal matrix factorization, enabling the joint learning of feature selection and adaptive data imputation. Furthermore, we fully leverage consensus cluster structure and cross-view local geometrical structure to enhance the synergistic learning process. We also provide a theoretical analysis to clarify the underlying collaborative mechanism of CLIM-FS. Experimental results on eight real-world multi-view datasets demonstrate that CLIM-FS outperforms state-of-the-art methods.

Paper Structure

This paper contains 31 sections, 3 theorems, 44 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

For any two samples $\bm{X}_{\cdot i}^{v}$ and $\bm{X}_{\cdot j}^{v}$ with missing values, their imputed data $\hat{\bm{X}}_{\cdot i}^{v}$ and $\hat{\bm{X}}_{\cdot j}^{v}$, as obtained from Eq. (C2), satisfy the following:

Figures (9)

  • Figure 1: Illustration of three types of incomplete multi-view data scenarios: (a) view-missing, (b) variable-missing, and (c) mixed-missing. $\bm{X}^{v}$ denotes the data matrix of the $v$-th view, where each column represents a sample and each row corresponds to a feature.
  • Figure 2: The framework of the proposed Cross-view joint Learning for mIxed-missing Multi-view unsupervised Feature Selection(CLIM-FS) method.
  • Figure 3: ACC of different methods on eight multi-view datasets with different feature selection ratios in the mixed-missing scenario.
  • Figure 4: ACC of different methods on eight multi-view datasets with different missing ratios in the mixed-missing scenario.
  • Figure 5: t-SNE visualizations of features selected by four "one-stage" methods on MSRC dataset.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof