Table of Contents
Fetching ...

A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks

Bo Hu, José C. Príncipe

TL;DR

This work introduces a family of kernelized matrix costs for learning Gaussian-mixture representations within Mixture Density Networks, enabling density approximation via a Hilbert-space framework. By defining scalar, vector-matrix, matrix-matrix (trace of a Schur complement), and SVD-based costs, it leverages closed-form Gaussian mix- tures to produce efficient, upper-bounded objectives that guide the network to match a data density $p(X)$ with a model density $q(X)$. Theoretical justification via Schwarz inequalities, Mercer-based kernel decompositions, and a variational view clarifies why these costs bound the data density and how the SVD cost relates to an optimal decomposition of cross-covariance; the multivariate extension offers a scalable way to capture interactions across multiple inputs. Empirically, the SVD cost outperforms alternatives in synthetic and image experiments, and a multivariate kernel-decoder demonstrates strong performance on MNIST and CIFAR-10, indicating practical utility for self-supervised density estimation and generative modeling.

Abstract

Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs in the Hilbert space: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Schur complement), and the SVD cost (the nuclear norm), for learning multiple centers required to define a mixture density.

A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks

TL;DR

This work introduces a family of kernelized matrix costs for learning Gaussian-mixture representations within Mixture Density Networks, enabling density approximation via a Hilbert-space framework. By defining scalar, vector-matrix, matrix-matrix (trace of a Schur complement), and SVD-based costs, it leverages closed-form Gaussian mix- tures to produce efficient, upper-bounded objectives that guide the network to match a data density with a model density . Theoretical justification via Schwarz inequalities, Mercer-based kernel decompositions, and a variational view clarifies why these costs bound the data density and how the SVD cost relates to an optimal decomposition of cross-covariance; the multivariate extension offers a scalable way to capture interactions across multiple inputs. Empirically, the SVD cost outperforms alternatives in synthetic and image experiments, and a multivariate kernel-decoder demonstrates strong performance on MNIST and CIFAR-10, indicating practical utility for self-supervised density estimation and generative modeling.

Abstract

Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs in the Hilbert space: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Schur complement), and the SVD cost (the nuclear norm), for learning multiple centers required to define a mixture density.

Paper Structure

This paper contains 6 sections, 5 theorems, 27 equations, 4 figures, 1 table.

Key Result

Corollary 3.1

The $L_p$ norm of a Gaussian mixture for any exponent, regardless of discrete or continuous prior, has a closed form.

Figures (4)

  • Figure 1: Data samples (blue dots) and generated samples (red dots) with MDNs.
  • Figure 2: Comparisons of cost value by shifting density $q$ away from $p$.
  • Figure 3: Illustrating the SVD cost's property of approximating a diagonal function using Gaussian residuals (Eq. \ref{['quantity']}). (a)$\sim$(c): $var=0.01$. (d): $var=0.001$, which still approximates an identity function but less accurately.
  • Figure 4: Visualizations of singular functions for two-moon $q$ and Gaussian $p$, and eigenfunctions when $p$ and $q$ are both two moons.

Theorems & Definitions (5)

  • Corollary 3.1
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Proposition 7