Kernel Density Matrices for Probabilistic Deep Learning

Fabio A. González; Raúl Ramos-Pollán; Joseph A. Gallego-Mejia

Kernel Density Matrices for Probabilistic Deep Learning

Fabio A. González, Raúl Ramos-Pollán, Joseph A. Gallego-Mejia

TL;DR

This work introduces kernel density matrices (KDMs), RKHS-based extensions of density matrices, to represent joint distributions over discrete and continuous variables in probabilistic deep learning. By defining differentiable projection, inference, and sampling operations in a kernelized framework, KDMs support density estimation, discriminative learning, and generative modeling within end-to-end neural architectures. The paper presents discrete and continuous KDM variants, establishes nonparametric and parametric learning approaches, and demonstrates tasks including conditional generation and learning from label proportions, with competitive results on standard benchmarks. This approach offers a flexible, reversible, and compositional toolkit for modeling uncertainty in complex ML tasks, with an accompanying library for reproducibility.

Abstract

This paper introduces a novel approach to probabilistic deep learning, kernel density matrices, which provide a simpler yet effective mechanism for representing joint probability distributions of both continuous and discrete random variables. In quantum mechanics, a density matrix is the most general way to describe the state of a quantum system. This work extends the concept of density matrices by allowing them to be defined in a reproducing kernel Hilbert space. This abstraction allows the construction of differentiable models for density estimation, inference, and sampling, and enables their integration into end-to-end deep neural models. In doing so, we provide a versatile representation of marginal and joint probability distributions that allows us to develop a differentiable, compositional, and reversible inference procedure that covers a wide range of machine learning tasks, including density estimation, discriminative learning, and generative modeling. The broad applicability of the framework is illustrated by two examples: an image classification model that can be naturally transformed into a conditional generative model, and a model for learning with label proportions that demonstrates the framework's ability to deal with uncertainty in the training samples. The framework is implemented as a library and is available at: https://github.com/fagonzalezo/kdm.

Kernel Density Matrices for Probabilistic Deep Learning

TL;DR

Abstract

Paper Structure (21 sections, 7 theorems, 21 equations, 5 figures, 2 tables, 4 algorithms)

This paper contains 21 sections, 7 theorems, 21 equations, 5 figures, 2 tables, 4 algorithms.

Introduction
Related work
Density matrices and kernel density matrices
Density matrices
Kernel density matrices
Discrete kernel density matrices.
Continuous kernel density matrices.
Density estimation with kernel density matrices
Joint densities with kernel density matrices
Inference with kernel density matrices
Sampling from kernel density matrices
Experiments
Bidirectional classification and generation with kernel density matrices
Classification with label proportions
Conclusions
...and 6 more sections

Key Result

Proposition 1

Let $\rho_{\mathbf{x}}=(\bm{C},\bm{p},k_\mathrm{cos})$ be a KDM over $\mathbb{R}^n$; let $\mathbb{X}=\{\bm{b}^{(1)},\dots,\bm{b}^{(n)} \} \subset \mathbb{R}^n$ be an orthogonal basis of $\mathbb{R}^n$, then $\{f_{\rho}(\bm{b}^{(i)})\}_{i=1,\dots n}$ is a categorical probability distribution for the

Figures (5)

Figure 1: classification and generation with KDMs. The top part represents a predictive model that uses an encoder (a) to map input samples into a latent space; the output of the encoder is represented as a KDM $\rho_{\mathbf x}$ (b) with one component, which is used to infer an output probability distribution of labels, represented by a KDM $\rho_{\mathbf y}$ (c), using Eq. \ref{['eq:inference-probability']} ; the classifier model has as a parameter a joint distribution of inputs and outputs, represented by a KDM $\rho_{\mathbf x', \mathbf y'}$ (d), which is learned with algorithm \ref{['alg:discriminative-training']}. The joint probability can be used to do conditional generation, as depicted in the bottom of the diagram. In this case, the input is a distribution of labels represented by the KDM $\rho'_{\mathbf y}$ (e) , which along with $\rho_{\mathbf x', \mathbf y'}$ (d) is used to infer a predicted KDM $\rho'_{\mathbf x}$ using Eq. \ref{['eq:inference-probability']}; we sample (f) from this KDM to generate input samples in the latent space which are decoded (g) to the original input space.
Figure 2: Conditional image generation from Mnist, Fashion-Mnist, and Cifar-10 using the KDM conditional generative model, each row corresponds to a different class.
Figure 3: KDM model for classification with label proportions. During training, the model receives as input bags of instances $\bm X^{(i)}={(\bm{x}^{(i)j}})_{j=1\dots m_i}$. The training dataset corresponds to a set of pairs $\bm D = {(\bm{X}^{(i)},\bm{y}^{(i)})}_{i=1\dots \ell}$, where each $\bm y^{(i)}$ is a vector representing the label proportions of the $i$-th bag. Each input is represented by a KDM with $m_i$ components. The algorithm learns a joint KDM $\rho_{\mathbf x', \mathbf y'}$. During prediction, the model receives individual samples, $x^{(*)}$, (equivalent to bags with only one element). The algorithm outputs a KDM $\rho_y$.
Figure 4: Performance evaluation for the learning with label proportions tasks. Column headers indicate bag size. Values correspond to AUC plus a t-test $99\%$ confidence interval.
Figure 5: Best parameters for the classification with kernel density matrix experiment.

Theorems & Definitions (11)

Definition 1: Kernel density matrix
Proposition 1
Theorem 2: Parzen1962OnMode
Proposition 3
Proposition 4
Proposition 1
proof
Proposition 3
proof
Proposition 4
...and 1 more

Kernel Density Matrices for Probabilistic Deep Learning

TL;DR

Abstract

Kernel Density Matrices for Probabilistic Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)