Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

Xianchao Xiu; Anning Yang; Chenyi Huang; Xinrong Li; Wanquan Liu

Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

Xianchao Xiu, Anning Yang, Chenyi Huang, Xinrong Li, Wanquan Liu

TL;DR

DSCOFS embeds double sparsity into a PCA-based unsupervised feature selection framework by combining $\ell_{2,0}$-norm (row-wise structural sparsity) and $\ell_0$-norm (element-wise sparsity). It solves the resulting nonconvex problem via a proximal alternating minimization with an exact penalty, and proves global convergence to a stationary point under KL properties. Empirical results on three synthetic and eight real-world datasets show consistent ACC and NMI gains (average increases of at least $3.34\%$ and $3.02\%$, respectively) over state-of-the-art methods, supported by an ablation and statistical analysis. The approach demonstrates that leveraging both global and local sparsity yields more discriminative feature subsets and robustness to noise, with practical implications for high-dimensional UFS and potential extensions to deeper or distributed settings.

Abstract

Unsupervised feature selection (UFS) is widely applied in machine learning and pattern recognition. However, most of the existing methods only consider a single sparsity, which makes it difficult to select valuable and discriminative feature subsets from the original high-dimensional feature set. In this paper, we propose a new UFS method called DSCOFS via embedding double sparsity constrained optimization into the classical principal component analysis (PCA) framework. Double sparsity refers to using $\ell_{2,0}$-norm and $\ell_0$-norm to simultaneously constrain variables, by adding the sparsity of different types, to achieve the purpose of improving the accuracy of identifying differential features. The core is that $\ell_{2,0}$-norm can remove irrelevant and redundant features, while $\ell_0$-norm can filter out irregular noisy features, thereby complementing $\ell_{2,0}$-norm to improve discrimination. An effective proximal alternating minimization method is proposed to solve the resulting nonconvex nonsmooth model. Theoretically, we rigorously prove that the sequence generated by our method globally converges to a stationary point. Numerical experiments on three synthetic datasets and eight real-world datasets demonstrate the effectiveness, stability, and convergence of the proposed method. In particular, the average clustering accuracy (ACC) and normalized mutual information (NMI) are improved by at least 3.34% and 3.02%, respectively, compared with the state-of-the-art methods. More importantly, two common statistical tests and a new feature similarity metric verify the advantages of double sparsity. All results suggest that our proposed DSCOFS provides a new perspective for feature selection.

Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

TL;DR

DSCOFS embeds double sparsity into a PCA-based unsupervised feature selection framework by combining

-norm (row-wise structural sparsity) and

-norm (element-wise sparsity). It solves the resulting nonconvex problem via a proximal alternating minimization with an exact penalty, and proves global convergence to a stationary point under KL properties. Empirical results on three synthetic and eight real-world datasets show consistent ACC and NMI gains (average increases of at least

and

, respectively) over state-of-the-art methods, supported by an ablation and statistical analysis. The approach demonstrates that leveraging both global and local sparsity yields more discriminative feature subsets and robustness to noise, with practical implications for high-dimensional UFS and potential extensions to deeper or distributed settings.

Abstract

-norm and

-norm to simultaneously constrain variables, by adding the sparsity of different types, to achieve the purpose of improving the accuracy of identifying differential features. The core is that

-norm can remove irrelevant and redundant features, while

-norm can filter out irregular noisy features, thereby complementing

-norm to improve discrimination. An effective proximal alternating minimization method is proposed to solve the resulting nonconvex nonsmooth model. Theoretically, we rigorously prove that the sequence generated by our method globally converges to a stationary point. Numerical experiments on three synthetic datasets and eight real-world datasets demonstrate the effectiveness, stability, and convergence of the proposed method. In particular, the average clustering accuracy (ACC) and normalized mutual information (NMI) are improved by at least 3.34% and 3.02%, respectively, compared with the state-of-the-art methods. More importantly, two common statistical tests and a new feature similarity metric verify the advantages of double sparsity. All results suggest that our proposed DSCOFS provides a new perspective for feature selection.

Paper Structure (29 sections, 1 theorem, 45 equations, 11 figures, 6 tables, 2 algorithms)

This paper contains 29 sections, 1 theorem, 45 equations, 11 figures, 6 tables, 2 algorithms.

Introduction
Preliminaries
Notations
SPCA Basics
The Proposed Method
New Formulation
Optimization Algorithm
Update $X^{k+1}$
Update $Y^{k+1}$
Update $Z^{k+1}$
Convergence Analysis
Numerical Experiments
Experimental Setup
Dataset Description
Parameter Setting
...and 14 more sections

Key Result

Theorem 3.2

Suppose that $\beta\geq \max\{2(\lambda_0+\lambda_1),2m\lambda_2\}$. Let $\{(X^k, Y^k, Z^k)\}$ be the sequence generated by Algorithm am1 for solving problem dsco-2. Then the following properties hold:

Figures (11)

Figure 1: The examples of results obtained by different sparsity constraints, where white means the value of the element is 0.
Figure 2: The ACC performance of our proposed DSCOFS and other UFS methods, including LapScore, UDFS, SOGFS, RNE, FSPCA, SPCAFS, SPCA-PSD, under time on the Isolet dataset.
Figure 3: The flowchart of feature selection and clustering of our proposed DSCOFS, where $\|X\|_{2,0}$ captures the global structural sparsity, and $\|X\|_0$ captures the local element-wise sparsity.
Figure 4: The original distribution and feature selection results on the synthetic datasets. (a), (f) and (k) are the selected synthetic datasets; (b)-(e) are the feature selection results on 2Spiral; (g)-(j) are the feature selection results on Banana; (l)-(o) are the feature selection results on Dartboard. The top two features are used to show the results of feature selection and the correct features are feature 4 and feature 5.
Figure 5: The ACC (%) curves of compared methods on eight real-world datasets. All methods select features based on the given parameters and only the mean results of ACC (%) are plotted.
...and 6 more figures

Theorems & Definitions (3)

Remark 3.1
Theorem 3.2
Remark 3.3

Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

TL;DR

Abstract

Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)