Table of Contents
Fetching ...

Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning

Feng Yu, MD Saifur Rahman Mazumder, Ying Su, Oscar Contreras Velasco

TL;DR

This work tackles unsupervised feature selection in high-dimensional data by marrying a Robust Subspace Recovery Autoencoder with adaptive graph learning. The proposed RAEUFS framework learns nonlinear feature representations while explicitly separating outliers through a robust latent subspace, and jointly optimizes feature selection, pseudo-label clustering, and local geometry. Through alternating optimization, the method demonstrates superior clustering accuracy and normalized mutual information on multiple benchmarks and maintains performance under substantial outlier contamination, with a real-world migration dataset illustrating practical interpretability. The approach advances robustness and interpretability in unsupervised feature selection, with strong empirical validation and insights for real-world deployment.

Abstract

Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering, but they suffer from two critical limitations: (1) an oversimplified linear mapping that fails to capture complex feature relationships, and (2) an assumption of uniform cluster distributions, ignoring outliers prevalent in real-world data. To address these issues, we propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations while inherently improving robustness to outliers. We further develop an efficient optimization algorithm for RAEUFS. Extensive experiments demonstrate that our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.

Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning

TL;DR

This work tackles unsupervised feature selection in high-dimensional data by marrying a Robust Subspace Recovery Autoencoder with adaptive graph learning. The proposed RAEUFS framework learns nonlinear feature representations while explicitly separating outliers through a robust latent subspace, and jointly optimizes feature selection, pseudo-label clustering, and local geometry. Through alternating optimization, the method demonstrates superior clustering accuracy and normalized mutual information on multiple benchmarks and maintains performance under substantial outlier contamination, with a real-world migration dataset illustrating practical interpretability. The approach advances robustness and interpretability in unsupervised feature selection, with strong empirical validation and insights for real-world deployment.

Abstract

Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering, but they suffer from two critical limitations: (1) an oversimplified linear mapping that fails to capture complex feature relationships, and (2) an assumption of uniform cluster distributions, ignoring outliers prevalent in real-world data. To address these issues, we propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations while inherently improving robustness to outliers. We further develop an efficient optimization algorithm for RAEUFS. Extensive experiments demonstrate that our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.

Paper Structure

This paper contains 15 sections, 15 equations, 4 figures, 6 tables, 2 algorithms.

Figures (4)

  • Figure 5.1: Dendogram plot for Hierarchical clustering
  • Figure 5.2: Kmeans algorithm
  • Figure 7.1: Comparison of the algorithms' performance across different numbers of selected features.
  • Figure 7.2: ACC sensitivity of $\alpha,\beta,\gamma,\eta,\lambda_1,\lambda_2$ on lung.

Theorems & Definitions (1)

  • Remark 1