Table of Contents
Fetching ...

Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability

Raquel Bosch-Romeu, Antonio Falcó, osé-Antonio Rodríguez-Gallego

TL;DR

A supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies is introduced, enabling low-dimensional spectral embeddings via dominant eigenmodes.

Abstract

We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot survey space, we assemble a frequency matrix whose columns aggregate feature occurrences within each class, and define a normalized Gram-type operator that satisfies the axioms of a density matrix. The resulting representation admits an intrinsic rank bound controlled by the number of classes, enabling low-dimensional spectral embeddings via dominant eigenmodes. Classification is performed in the reduced space through class-conditional kernel density estimation and a maximum-likelihood decision rule. We establish structural invariances, provide complexity estimates, and validate the approach on synthetic benchmarks probing high cardinality, sparsity, noise, and class imbalance.

Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability

TL;DR

A supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies is introduced, enabling low-dimensional spectral embeddings via dominant eigenmodes.

Abstract

We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot survey space, we assemble a frequency matrix whose columns aggregate feature occurrences within each class, and define a normalized Gram-type operator that satisfies the axioms of a density matrix. The resulting representation admits an intrinsic rank bound controlled by the number of classes, enabling low-dimensional spectral embeddings via dominant eigenmodes. Classification is performed in the reduced space through class-conditional kernel density estimation and a maximum-likelihood decision rule. We establish structural invariances, provide complexity estimates, and validate the approach on synthetic benchmarks probing high cardinality, sparsity, noise, and class imbalance.
Paper Structure (43 sections, 17 theorems, 101 equations, 4 tables, 1 algorithm)

This paper contains 43 sections, 17 theorems, 101 equations, 4 tables, 1 algorithm.

Key Result

Proposition 3.5

$\rho_{\mathcal{D}}$ is symmetric, positive semidefinite, and satisfies $\mathrm{tr}(\rho_{\mathcal{D}})=1$.

Theorems & Definitions (62)

  • Definition 2.2: Categorical state space
  • Example 2.3: Binary case
  • Definition 2.4: Survey vector (block-concatenation)
  • Definition 2.5: Tensor encoding of labeled samples
  • Remark 2.6: Discretized numerical variables
  • Definition 2.7: Training dataset
  • Definition 3.1: Class-conditional frequency vectors
  • Definition 3.2: Frequency matrix
  • Definition 3.3: Amplitude lifting
  • Definition 3.4: Density matrix
  • ...and 52 more