Feature Selection for Latent Factor Models
Rittwika Kansabanik, Adrian Barbu
TL;DR
The paper tackles high-dimensional multiclass classification by introducing class-specific feature selection via latent-factor models (PPCA, LFA, ELF, HeteroPCA) using a signal-to-noise ratio criterion. This per-class SNR approach enables scalable, incremental learning with strong theoretical true feature recovery guarantees under certain assumptions, while avoiding retraining of global models when new classes are added. Empirical results on simulations and large-scale datasets (CIFAR-10/100, ImageNet-1k) show the proposed methods often surpass standard linear-model feature selection methods (e.g., FSA, TISP) while achieving substantial dimensionality reduction and favorable computation times. The work offers a principled, interpretable framework for class-wise feature selection in latent-factor spaces, with implications for scalable and incremental learning in high-dimensional settings.
Abstract
Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods and introducing a signal-to-noise ratio (SNR) feature selection criterion. This novel approach has theoretical true feature recovery guarantees under certain assumptions and is shown to outperform some existing feature selection methods on standard classification datasets.
