Marginal Laplacian Score

Guy Hay; Ohad Volk

Marginal Laplacian Score

Guy Hay, Ohad Volk

TL;DR

The paper tackles unsupervised feature selection for high-dimensional imbalanced data by introducing Marginal Laplacian Score (MLS), a margin-focused modification of the Laplacian Score (LS). MLS uses sample-level and interaction-level weights to preserve the margin structure of the data, formalized as $MLS_r = \frac{\sum_{ij \in \mathcal{M}^k} (f_{r_i}-f_{r_j})^2 w_{ij} u_i}{\mathrm{Var}(f_r)}$, and supports a matrix formulation for efficiency. It also integrates MLS into the differentiable unsupervised feature selection framework (DUFS) to yield DUFS-MLS, with a differentiable objective that operates on margin-preserving subsets. Empirical results on synthetic and 14 public datasets show MLS and DUFS-MLS achieve higher AUC ROC and robustness to noisy features, supporting the margin-based assumption that minority/anomalous samples cluster at feature margins. The work offers a principled margin-focused alternative to LS for imbalanced data and demonstrates practical gains in unsupervised feature selection workflows.

Abstract

High-dimensional imbalanced data poses a machine learning challenge. In the absence of sufficient or high-quality labels, unsupervised feature selection methods are crucial for the success of subsequent algorithms. Therefore, we introduce a Marginal Laplacian Score (MLS), a modification of the well known Laplacian Score (LS) tailored to better address imbalanced data. We introduce an assumption that the minority class or anomalous appear more frequently in the margin of the features. Consequently, MLS aims to preserve the local structure of the dataset's margin. We propose its integration into modern feature selection methods that utilize the Laplacian score. We integrate the MLS algorithm into the Differentiable Unsupervised Feature Selection (DUFS), resulting in DUFS-MLS. The proposed methods demonstrate robust and improved performance on synthetic and public datasets.

Marginal Laplacian Score

TL;DR

, and supports a matrix formulation for efficiency. It also integrates MLS into the differentiable unsupervised feature selection framework (DUFS) to yield DUFS-MLS, with a differentiable objective that operates on margin-preserving subsets. Empirical results on synthetic and 14 public datasets show MLS and DUFS-MLS achieve higher AUC ROC and robustness to noisy features, supporting the margin-based assumption that minority/anomalous samples cluster at feature margins. The work offers a principled margin-focused alternative to LS for imbalanced data and demonstrates practical gains in unsupervised feature selection workflows.

Abstract

Paper Structure (22 sections, 24 equations, 4 figures, 12 tables)

This paper contains 22 sections, 24 equations, 4 figures, 12 tables.

Introduction
Previous Methods
Method
Preliminaries
Marginal Laplacian Score
A note on non-marginal data
Matrix formulation
Temperature $t$ hyper parameter for large datasets
Differentiable Unsupervised Feature Selection with Marginal Laplacian Score
Experiments
Synthetic Data Experiment
Public Data Experiment
Sample-Level Weight Analysis
Empirical Proof of the Marginal Assumption
Conclusions
...and 7 more sections

Figures (4)

Figure 1: Comparison of unmodified and noisy data results. (A) Unmodified data setup results showing the AUC ROC values. (B) Noisy data setup results showing the AUC ROC values. (C) Predictive accuracy metrics for noisy data setup.
Figure 2: Sample-level weights across the glass dataset for various margin quantiles. In (A), (B), and (C), corresponding to quantiles 0.025, 0.05, and 0.1, respectively, the depicted figures showcase the distribution of weights at the sample level.
Figure 3: Mean Kolmogorov-Smirnov distances and corresponding p-values across various quantiles.
Figure 4: Theoretical illustration of a two-feature distribution with marginal positive samples.

Theorems & Definitions (8)

Definition 3.1: Feature Skewness Set
Definition 3.2: Feature Margin
Definition 3.3: Margin of Interest
Definition 3.4: Dataset Margin
Definition 3.5: Feature Margin Indicator Function
Definition 3.6: Dataset Margin Indicator Function
proof
proof

Marginal Laplacian Score

TL;DR

Abstract

Marginal Laplacian Score

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (8)