Table of Contents
Fetching ...

Feature Selection for Latent Factor Models

Rittwika Kansabanik, Adrian Barbu

TL;DR

The paper tackles high-dimensional multiclass classification by introducing class-specific feature selection via latent-factor models (PPCA, LFA, ELF, HeteroPCA) using a signal-to-noise ratio criterion. This per-class SNR approach enables scalable, incremental learning with strong theoretical true feature recovery guarantees under certain assumptions, while avoiding retraining of global models when new classes are added. Empirical results on simulations and large-scale datasets (CIFAR-10/100, ImageNet-1k) show the proposed methods often surpass standard linear-model feature selection methods (e.g., FSA, TISP) while achieving substantial dimensionality reduction and favorable computation times. The work offers a principled, interpretable framework for class-wise feature selection in latent-factor spaces, with implications for scalable and incremental learning in high-dimensional settings.

Abstract

Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods and introducing a signal-to-noise ratio (SNR) feature selection criterion. This novel approach has theoretical true feature recovery guarantees under certain assumptions and is shown to outperform some existing feature selection methods on standard classification datasets.

Feature Selection for Latent Factor Models

TL;DR

The paper tackles high-dimensional multiclass classification by introducing class-specific feature selection via latent-factor models (PPCA, LFA, ELF, HeteroPCA) using a signal-to-noise ratio criterion. This per-class SNR approach enables scalable, incremental learning with strong theoretical true feature recovery guarantees under certain assumptions, while avoiding retraining of global models when new classes are added. Empirical results on simulations and large-scale datasets (CIFAR-10/100, ImageNet-1k) show the proposed methods often surpass standard linear-model feature selection methods (e.g., FSA, TISP) while achieving substantial dimensionality reduction and favorable computation times. The work offers a principled, interpretable framework for class-wise feature selection in latent-factor spaces, with implications for scalable and incremental learning in high-dimensional settings.

Abstract

Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods and introducing a signal-to-noise ratio (SNR) feature selection criterion. This novel approach has theoretical true feature recovery guarantees under certain assumptions and is shown to outperform some existing feature selection methods on standard classification datasets.

Paper Structure

This paper contains 12 sections, 7 theorems, 22 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Assume that the data has been properly centralized and let ${\boldsymbol{\beta}} = \mathbf{W}^T({\boldsymbol{\Psi}} + \mathbf{W}\mathbf{W}^T)^{-1}$. The EM updates of $(\hat{\mathbf{W}}, \hat{{\boldsymbol{\Psi}}})$ for LFA are:

Figures (3)

  • Figure 1: Estimation error vs. several iterations for (a) the signal variance $\hat{{\boldsymbol{sig}}}$, and (b) the noise variance, $\hat{{\boldsymbol{\psi}}}$, when $n=1000$ and $d=110$. (c) The true SNR (${\boldsymbol{SNR}}^{\ast}$) and the estimated SNRs obtained by the four methods.
  • Figure 2: Estimation error vs. number of observations ($n$) for (a) the signal variance, $\hat{{\boldsymbol{sig}}}$, (b) the noise variance, $\hat{{\boldsymbol{\psi}}}$, and (c) the SNRs, $\hat{{\boldsymbol{SNR}}}$, when $d=110$.
  • Figure 3: Accuracy vs. number of selected features for CIFAR-10 (left), CIFAR-100(middle) and ImageNet (right).

Theorems & Definitions (7)

  • Theorem 1: due to ghahramani1996algorithm
  • Theorem 2
  • Proposition 1
  • Proposition 2
  • Theorem 3
  • Proposition 3: due to wang2023scalable
  • Theorem 4