A Family of Kernelized Matrix Costs for Multiple-Output Mixture Neural Networks
Bo Hu, José C. Príncipe
TL;DR
This work introduces a family of kernelized matrix costs for learning Gaussian-mixture representations within Mixture Density Networks, enabling density approximation via a Hilbert-space framework. By defining scalar, vector-matrix, matrix-matrix (trace of a Schur complement), and SVD-based costs, it leverages closed-form Gaussian mix- tures to produce efficient, upper-bounded objectives that guide the network to match a data density $p(X)$ with a model density $q(X)$. Theoretical justification via Schwarz inequalities, Mercer-based kernel decompositions, and a variational view clarifies why these costs bound the data density and how the SVD cost relates to an optimal decomposition of cross-covariance; the multivariate extension offers a scalable way to capture interactions across multiple inputs. Empirically, the SVD cost outperforms alternatives in synthetic and image experiments, and a multivariate kernel-decoder demonstrates strong performance on MNIST and CIFAR-10, indicating practical utility for self-supervised density estimation and generative modeling.
Abstract
Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs in the Hilbert space: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Schur complement), and the SVD cost (the nuclear norm), for learning multiple centers required to define a mixture density.
