Table of Contents
Fetching ...

Bi-Level Unsupervised Feature Selection

Jingjing Liu, Xiansen Ju, Xianchao Xiu, Wanquan Liu

TL;DR

This work tackles unsupervised feature selection by introducing BLUFS, a bi-level framework that jointly leverages a clustering level with continuous pseudo-labels and a feature level with strict $\ell_{2,0}$ sparsity on the projection. The optimization is performed via a proximal alternating minimization scheme that updates $P$, $W$, and $Y$ and is equipped with convergence guarantees. Empirical results on synthetic and eight real datasets show BLUFS achieving superior clustering accuracy and NMI, as well as improved downstream classification, especially in high-dimensional settings like gisette. The approach highlights the value of integrating structure-aware clustering with strict feature sparsity to enhance interpretability and performance in UFS, while offering scalable computation and robust behavior across parameters.

Abstract

Unsupervised feature selection (UFS) is an important task in data engineering. However, most UFS methods construct models from a single perspective and often fail to simultaneously evaluate feature importance and preserve their inherent data structure, thus limiting their performance. To address this challenge, we propose a novel bi-level unsupervised feature selection (BLUFS) method, including a clustering level and a feature level. Specifically, at the clustering level, spectral clustering is used to generate pseudo-labels for representing the data structure, while a continuous linear regression model is developed to learn the projection matrix. At the feature level, the $\ell_{2,0}$-norm constraint is imposed on the projection matrix for more effectively selecting features. To the best of our knowledge, this is the first work to combine a bi-level framework with the $\ell_{2,0}$-norm. To solve the proposed bi-level model, we design an efficient proximal alternating minimization (PAM) algorithm, whose subproblems either have explicit solutions or can be computed by fast solvers. Furthermore, we establish the convergence result and computational complexity. Finally, extensive experiments on two synthetic datasets and eight real datasets demonstrate the superiority of BLUFS in clustering and classification tasks.

Bi-Level Unsupervised Feature Selection

TL;DR

This work tackles unsupervised feature selection by introducing BLUFS, a bi-level framework that jointly leverages a clustering level with continuous pseudo-labels and a feature level with strict sparsity on the projection. The optimization is performed via a proximal alternating minimization scheme that updates , , and and is equipped with convergence guarantees. Empirical results on synthetic and eight real datasets show BLUFS achieving superior clustering accuracy and NMI, as well as improved downstream classification, especially in high-dimensional settings like gisette. The approach highlights the value of integrating structure-aware clustering with strict feature sparsity to enhance interpretability and performance in UFS, while offering scalable computation and robust behavior across parameters.

Abstract

Unsupervised feature selection (UFS) is an important task in data engineering. However, most UFS methods construct models from a single perspective and often fail to simultaneously evaluate feature importance and preserve their inherent data structure, thus limiting their performance. To address this challenge, we propose a novel bi-level unsupervised feature selection (BLUFS) method, including a clustering level and a feature level. Specifically, at the clustering level, spectral clustering is used to generate pseudo-labels for representing the data structure, while a continuous linear regression model is developed to learn the projection matrix. At the feature level, the -norm constraint is imposed on the projection matrix for more effectively selecting features. To the best of our knowledge, this is the first work to combine a bi-level framework with the -norm. To solve the proposed bi-level model, we design an efficient proximal alternating minimization (PAM) algorithm, whose subproblems either have explicit solutions or can be computed by fast solvers. Furthermore, we establish the convergence result and computational complexity. Finally, extensive experiments on two synthetic datasets and eight real datasets demonstrate the superiority of BLUFS in clustering and classification tasks.

Paper Structure

This paper contains 36 sections, 2 theorems, 46 equations, 11 figures, 8 tables, 2 algorithms.

Key Result

Lemma 1

Assume that $\{Q^k\}_{k \in \mathbb{N}}$ is generated by Algorithm algorithm 1. Then the following inequality holds where $\tau = \min\{\tau_1, \tau_2, \tau_3\}$.

Figures (11)

  • Figure 1: Comparisons of BLSFS hu2024bi and our proposed BLUFS. BLSFS is composed of classification and feature levels. In the classification level, discrete pseudo-labels are obtained through spectral clustering, and then a projection matrix is learned via classifier training. In the feature level, a similarity matrix is obtained using an adaptive graph learning model. Finally, by applying the $\ell_{2,1}$-norm to $W$, features with higher rankings are obtained. BLUFS is composed of clustering and feature levels. Unlike BLSFS, in the clustering level, continuous pseudo-labels representing the clustering structure are obtained through spectral clustering, and a projection matrix is obtained via a linear transformation model. In the feature level, after obtaining the similarity matrix, the number of features is strictly limited and sufficient sparsity is ensured by imposing the $\ell_{2,0}$-norm constraint on $W$.
  • Figure 2: Heatmap visualization of learned sparse matrices by $\ell_{2,1}$-norm and $\ell_{2,0}$-norm.
  • Figure 3: Visual comparisons on the Dartboard1 dataset, where (a)-(j) are the feature selection results.
  • Figure 4: Visual comparisons on the Banana dataset, where (a)-(j) are the feature selection results.
  • Figure 5: Visual comparisons of the ACC metric under different datasets with different numbers of selected features.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Lemma 1
  • proof
  • Theorem 1