Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning
Feng Yu, MD Saifur Rahman Mazumder, Ying Su, Oscar Contreras Velasco
TL;DR
This work tackles unsupervised feature selection in high-dimensional data by marrying a Robust Subspace Recovery Autoencoder with adaptive graph learning. The proposed RAEUFS framework learns nonlinear feature representations while explicitly separating outliers through a robust latent subspace, and jointly optimizes feature selection, pseudo-label clustering, and local geometry. Through alternating optimization, the method demonstrates superior clustering accuracy and normalized mutual information on multiple benchmarks and maintains performance under substantial outlier contamination, with a real-world migration dataset illustrating practical interpretability. The approach advances robustness and interpretability in unsupervised feature selection, with strong empirical validation and insights for real-world deployment.
Abstract
Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering, but they suffer from two critical limitations: (1) an oversimplified linear mapping that fails to capture complex feature relationships, and (2) an assumption of uniform cluster distributions, ignoring outliers prevalent in real-world data. To address these issues, we propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations while inherently improving robustness to outliers. We further develop an efficient optimization algorithm for RAEUFS. Extensive experiments demonstrate that our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.
