Table of Contents
Fetching ...

Semi-Supervised Multi-Label Feature Selection with Consistent Sparse Graph Learning

Yan Zhong, Xingyu Wu, Xinping Zhao, Li Zhang, Xinyuan Song, Lei Shi, Bingbing Jiang

TL;DR

This work tackles semi-supervised multi-label feature selection in high-dimensional data by introducing SGMFS, which jointly learns a shared label subspace to capture label correlations and an adaptive sparse graph to preserve space and structure across label and feature spaces. By integrating a soft-label propagation framework with space consistency and a sparsity-constrained reconstruction graph, the method produces reliable unlabeled predictions and a discriminative feature weight matrix $W$ for selection. The authors provide rigorous optimization and convergence guarantees, along with extensive experiments on seven diverse datasets showing superior performance and stability compared to state-of-the-art baselines. The approach offers scalable, robust feature selection for multi-label tasks under incomplete labeling and has potential impact on practical domains requiring efficient, interpretable feature weighting.

Abstract

In practical domains, high-dimensional data are usually associated with diverse semantic labels, whereas traditional feature selection methods are designed for single-label data. Moreover, existing multi-label methods encounter two main challenges in semi-supervised scenarios: (1). Most semi-supervised methods fail to evaluate the label correlations without enough labeled samples, which are the critical information of multi-label feature selection, making label-specific features discarded. (2). The similarity graph structure directly derived from the original feature space is suboptimal for multi-label problems in existing graph-based methods, leading to unreliable soft labels and degraded feature selection performance. To overcome them, we propose a consistent sparse graph learning method for multi-label semi-supervised feature selection (SGMFS), which can enhance the feature selection performance by maintaining space consistency and learning label correlations in semi-supervised scenarios. Specifically, for Challenge (1), SGMFS learns a low-dimensional and independent label subspace from the projected features, which can compatibly cross multiple labels and effectively achieve the label correlations. For Challenge (2), instead of constructing a fixed similarity graph for semi-supervised learning, SGMFS thoroughly explores the intrinsic structure of the data by performing sparse reconstruction of samples in both the label space and the learned subspace simultaneously. In this way, the similarity graph can be adaptively learned to maintain the consistency between label space and the learned subspace, which can promote propagating proper soft labels for unlabeled samples, facilitating the ultimate feature selection. An effective solution with fast convergence is designed to optimize the objective function. Extensive experiments validate the superiority of SGMFS.

Semi-Supervised Multi-Label Feature Selection with Consistent Sparse Graph Learning

TL;DR

This work tackles semi-supervised multi-label feature selection in high-dimensional data by introducing SGMFS, which jointly learns a shared label subspace to capture label correlations and an adaptive sparse graph to preserve space and structure across label and feature spaces. By integrating a soft-label propagation framework with space consistency and a sparsity-constrained reconstruction graph, the method produces reliable unlabeled predictions and a discriminative feature weight matrix for selection. The authors provide rigorous optimization and convergence guarantees, along with extensive experiments on seven diverse datasets showing superior performance and stability compared to state-of-the-art baselines. The approach offers scalable, robust feature selection for multi-label tasks under incomplete labeling and has potential impact on practical domains requiring efficient, interpretable feature weighting.

Abstract

In practical domains, high-dimensional data are usually associated with diverse semantic labels, whereas traditional feature selection methods are designed for single-label data. Moreover, existing multi-label methods encounter two main challenges in semi-supervised scenarios: (1). Most semi-supervised methods fail to evaluate the label correlations without enough labeled samples, which are the critical information of multi-label feature selection, making label-specific features discarded. (2). The similarity graph structure directly derived from the original feature space is suboptimal for multi-label problems in existing graph-based methods, leading to unreliable soft labels and degraded feature selection performance. To overcome them, we propose a consistent sparse graph learning method for multi-label semi-supervised feature selection (SGMFS), which can enhance the feature selection performance by maintaining space consistency and learning label correlations in semi-supervised scenarios. Specifically, for Challenge (1), SGMFS learns a low-dimensional and independent label subspace from the projected features, which can compatibly cross multiple labels and effectively achieve the label correlations. For Challenge (2), instead of constructing a fixed similarity graph for semi-supervised learning, SGMFS thoroughly explores the intrinsic structure of the data by performing sparse reconstruction of samples in both the label space and the learned subspace simultaneously. In this way, the similarity graph can be adaptively learned to maintain the consistency between label space and the learned subspace, which can promote propagating proper soft labels for unlabeled samples, facilitating the ultimate feature selection. An effective solution with fast convergence is designed to optimize the objective function. Extensive experiments validate the superiority of SGMFS.

Paper Structure

This paper contains 28 sections, 6 theorems, 50 equations, 9 figures, 5 tables, 1 algorithm.

Key Result

Proposition 4.1

Denote $\textbf{W} \in \mathbb{R}^{d \times c}, \textbf{b} \in \mathbb{R}^{c \times 1},\textbf{F} \in \mathbb{R}^{n \times c},\textbf{M} \in \mathbb{R}^{n \times n},\textbf{P} \in \mathbb{R}^{lsd \times c}$ and $\textbf{Q} \in \mathbb{R}^{n \times lsd}$, $g\left(\textbf{M}, \textbf{b}, \textbf{F},\t

Figures (9)

  • Figure 1: Overview of the consistent Sparse Graph learning for Multi-label semi-supervised Feature Selection (SGMFS): Firstly, the predicted labels ( soft-labels) $\textbf{F}$ and sparse graph matrix $\textbf{M}$ are initialized. Then the shared label subspace can be obtained by subspace learning. Subsequently, the sparse graph matrix $\textbf{M}$ is updated by space consistency learning between label subspace and original label space, and the updated $\textbf{M}$ is used to further update soft-labels $\textbf{F}$ and shared labels $\textbf{Q}$. After iterative training until convergence, the optimal feature weight matrix $\textbf{W}$ can be obtained for semi-supervised feature selection.
  • Figure 2: The process of sparse reconstruction of samples space consistency preservation in SGMFS: (i) The shared label subspace is generated by original feature and label spaces. (ii) The most suitable samples $B$ and $C$ are adaptively selected by sparse learning to reconstruct sample $A$. Note that the reconstructed sample points for the same objective sample point in label space and subspace are consistent.
  • Figure 3: Comparison of SGMFS with other feature selection algorithms (CSFS, FSNM, LSDF, MIFS, SCFS, SFSS, and SFS-BLL) on using four metrics: Hamming Loss, Ranking Loss, Macro Average, and Micro Average.
  • Figure 4: Spider web diagrams comparing the stability of SGMFS (red hexagon) with seven state-of-the-art feature selection algorithms across seven multi-label datasets, evaluated using four metrics: $Hamming$$Loss$, $Ranking$$Loss$, $Macro$$Average$, and $Micro$$Average$.
  • Figure 5: The triangular heat maps about the distributions of label correlations learned by SGMFS on dataset Emotions with 30% labeled samples (a, b, c, d) and 40% labeled samples (e, f, g, h). The label correlations among the predicted label matrix $F$ (a,e), feature weight matrix $W$ (b,f), ground-truth labels $Y$ of labeled samples (c,g), and the shared labels $Q$ (d,h) are compared.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Proposition 4.1
  • proof
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • proof
  • Theorem 4.4
  • proof
  • Proposition 4.5
  • proof
  • ...and 4 more