Bi-Level Unsupervised Feature Selection
Jingjing Liu, Xiansen Ju, Xianchao Xiu, Wanquan Liu
TL;DR
This work tackles unsupervised feature selection by introducing BLUFS, a bi-level framework that jointly leverages a clustering level with continuous pseudo-labels and a feature level with strict $\ell_{2,0}$ sparsity on the projection. The optimization is performed via a proximal alternating minimization scheme that updates $P$, $W$, and $Y$ and is equipped with convergence guarantees. Empirical results on synthetic and eight real datasets show BLUFS achieving superior clustering accuracy and NMI, as well as improved downstream classification, especially in high-dimensional settings like gisette. The approach highlights the value of integrating structure-aware clustering with strict feature sparsity to enhance interpretability and performance in UFS, while offering scalable computation and robust behavior across parameters.
Abstract
Unsupervised feature selection (UFS) is an important task in data engineering. However, most UFS methods construct models from a single perspective and often fail to simultaneously evaluate feature importance and preserve their inherent data structure, thus limiting their performance. To address this challenge, we propose a novel bi-level unsupervised feature selection (BLUFS) method, including a clustering level and a feature level. Specifically, at the clustering level, spectral clustering is used to generate pseudo-labels for representing the data structure, while a continuous linear regression model is developed to learn the projection matrix. At the feature level, the $\ell_{2,0}$-norm constraint is imposed on the projection matrix for more effectively selecting features. To the best of our knowledge, this is the first work to combine a bi-level framework with the $\ell_{2,0}$-norm. To solve the proposed bi-level model, we design an efficient proximal alternating minimization (PAM) algorithm, whose subproblems either have explicit solutions or can be computed by fast solvers. Furthermore, we establish the convergence result and computational complexity. Finally, extensive experiments on two synthetic datasets and eight real datasets demonstrate the superiority of BLUFS in clustering and classification tasks.
