Table of Contents
Fetching ...

Supervised Quadratic Feature Analysis: Information Geometry Approach for Dimensionality Reduction

Daniel Herrera-Esposito, Johannes Burge

TL;DR

The paper introduces Supervised Quadratic Feature Analysis (SQFA), a linear dimensionality-reduction method that learns filters by maximizing Fisher-Rao distances between class-conditional Gaussians, offering a geometric alternative to traditional dissimilarity measures. By leveraging exact results for zero-mean Gaussians and a Calvo-Oller bound for general Gaussians, SQFA delivers discriminative low-dimensional features that support strong quadratic discriminability with QDA and competitive benchmarks on real datasets. The approach highlights the utility of information geometry in ML and neuroscience, providing a practical, efficient framework with potential extensions to non-Gaussian and nonlinear regimes. The work also demonstrates how comparing Fisher-Rao, Bhattacharyya, and Hellinger distances sheds light on the behavior of multiclass discriminability and model robustness.

Abstract

Supervised dimensionality reduction maps labeled data into a low-dimensional feature space while preserving class discriminability. A common approach is to maximize a statistical measure of dissimilarity between classes in the feature space. Information geometry provides an alternative framework for measuring class dissimilarity, with the potential for improved insights and novel applications. Information geometry, which is grounded in Riemannian geometry, uses the Fisher information metric, a local measure of discriminability that induces the Fisher-Rao distance. Here, we present Supervised Quadratic Feature Analysis (SQFA), a linear dimensionality reduction method that maximizes Fisher-Rao distances between class-conditional distributions, under Gaussian assumptions. We motivate the Fisher-Rao distance as a good proxy for discriminability. We show that SQFA features support good classification performance with Quadratic Discriminant Analysis (QDA) on three real-world datasets. SQFA provides a novel framework for supervised dimensionality reduction, motivating future research in applying information geometry to machine learning and neuroscience.

Supervised Quadratic Feature Analysis: Information Geometry Approach for Dimensionality Reduction

TL;DR

The paper introduces Supervised Quadratic Feature Analysis (SQFA), a linear dimensionality-reduction method that learns filters by maximizing Fisher-Rao distances between class-conditional Gaussians, offering a geometric alternative to traditional dissimilarity measures. By leveraging exact results for zero-mean Gaussians and a Calvo-Oller bound for general Gaussians, SQFA delivers discriminative low-dimensional features that support strong quadratic discriminability with QDA and competitive benchmarks on real datasets. The approach highlights the utility of information geometry in ML and neuroscience, providing a practical, efficient framework with potential extensions to non-Gaussian and nonlinear regimes. The work also demonstrates how comparing Fisher-Rao, Bhattacharyya, and Hellinger distances sheds light on the behavior of multiclass discriminability and model robustness.

Abstract

Supervised dimensionality reduction maps labeled data into a low-dimensional feature space while preserving class discriminability. A common approach is to maximize a statistical measure of dissimilarity between classes in the feature space. Information geometry provides an alternative framework for measuring class dissimilarity, with the potential for improved insights and novel applications. Information geometry, which is grounded in Riemannian geometry, uses the Fisher information metric, a local measure of discriminability that induces the Fisher-Rao distance. Here, we present Supervised Quadratic Feature Analysis (SQFA), a linear dimensionality reduction method that maximizes Fisher-Rao distances between class-conditional distributions, under Gaussian assumptions. We motivate the Fisher-Rao distance as a good proxy for discriminability. We show that SQFA features support good classification performance with Quadratic Discriminant Analysis (QDA) on three real-world datasets. SQFA provides a novel framework for supervised dimensionality reduction, motivating future research in applying information geometry to machine learning and neuroscience.

Paper Structure

This paper contains 37 sections, 19 equations, 15 figures.

Figures (15)

  • Figure 1: SQFA learns features using information geometry. Left. SQFA and smSQFA map the $n$-dimensional data into an $m$-dimensional feature space using the linear filters $\mathbf{F}$. In smSQFA, the class-specific second-moment matrices of the features are represented as points in the $\mathrm{SPD(m)}$ manifold (which is an open cone). Fisher-Rao distances in $\mathrm{SPD(m)}$ are used for learning. Right. Each point in $\mathrm{SPD(m)}$ (top) corresponds to a second-moment ellipse (below). As the distance in $\mathrm{SPD(m)}$ increases, the second-order statistics become more different and more discriminable.
  • Figure 2: SQFA vs. LDA vs. PCA. Ellipses show the probability distributions for three classes (colors) in a 6D toy dataset. Each panel shows two dimensions of the data vector $\mathbf{x}$, where the classes are separated by different statistical properties. Classes are distinguished by large differences in the covariances (dimensions 1-2), small differences in the means (dimensions 3-4), or neither (dimensions 5-6). Two filters were learned with each of SQFA, LDA, and PCA, and are shown as arrows. SQFA prefers the most discriminative subspace.
  • Figure 3: SQFA vs. smSQFA. Ellipses show the probability distributions for three classes (shown as colored ellipses) in a 4D toy dataset. Classes are distinguished by large differences in the means (dimensions 1-2), and by large differences in the covariances (dimensions 3-4). We learned two filters with SQFA and smSQFA. The SQFA filters select for the most discriminative subspace (dimensions 1-2).
  • Figure 4: SQFA extracts useful features using class-conditional second-order statistics. Top. Example images from SVHN. Center. QDA accuracy using the features learned by the different methods. For SQFA variants, the median and interquartile range of 20 different initializations are shown. Left. Filters learned by the methods.
  • Figure 5: SQFA can exploit class-conditional first- and second-order information. Top. Example MNIST images. Center. QDA accuracy using the features learned by the different methods. For SQFA variants, the median and interquartile range of 20 initializations are shown. Left. Filters learned by the methods.
  • ...and 10 more figures