Marginal Laplacian Score
Guy Hay, Ohad Volk
TL;DR
The paper tackles unsupervised feature selection for high-dimensional imbalanced data by introducing Marginal Laplacian Score (MLS), a margin-focused modification of the Laplacian Score (LS). MLS uses sample-level and interaction-level weights to preserve the margin structure of the data, formalized as $MLS_r = \frac{\sum_{ij \in \mathcal{M}^k} (f_{r_i}-f_{r_j})^2 w_{ij} u_i}{\mathrm{Var}(f_r)}$, and supports a matrix formulation for efficiency. It also integrates MLS into the differentiable unsupervised feature selection framework (DUFS) to yield DUFS-MLS, with a differentiable objective that operates on margin-preserving subsets. Empirical results on synthetic and 14 public datasets show MLS and DUFS-MLS achieve higher AUC ROC and robustness to noisy features, supporting the margin-based assumption that minority/anomalous samples cluster at feature margins. The work offers a principled margin-focused alternative to LS for imbalanced data and demonstrates practical gains in unsupervised feature selection workflows.
Abstract
High-dimensional imbalanced data poses a machine learning challenge. In the absence of sufficient or high-quality labels, unsupervised feature selection methods are crucial for the success of subsequent algorithms. Therefore, we introduce a Marginal Laplacian Score (MLS), a modification of the well known Laplacian Score (LS) tailored to better address imbalanced data. We introduce an assumption that the minority class or anomalous appear more frequently in the margin of the features. Consequently, MLS aims to preserve the local structure of the dataset's margin. We propose its integration into modern feature selection methods that utilize the Laplacian score. We integrate the MLS algorithm into the Differentiable Unsupervised Feature Selection (DUFS), resulting in DUFS-MLS. The proposed methods demonstrate robust and improved performance on synthetic and public datasets.
