Table of Contents
Fetching ...

The Representation Jensen-Shannon Divergence

Jhoan K. Hoyos-Osorio, Luis G. Sanchez-Giraldo

TL;DR

This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence, and shows that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences.

Abstract

Quantifying the difference between probability distributions is crucial in machine learning. However, estimating statistical divergences from empirical samples is challenging due to unknown underlying distributions. This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence. Our approach embeds data into a reproducing kernel Hilbert space (RKHS), representing distributions through uncentered covariance operators. We then compute the Jensen-Shannon divergence between these operators, thereby establishing a proper divergence measure between probability distributions in the input space. We provide estimators based on kernel matrices and empirical covariance matrices using Fourier features. Theoretical analysis reveals that RJSD is a lower bound on the Jensen-Shannon divergence, enabling variational estimation. Additionally, we show that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences. Our experimental results demonstrate RJSD's superiority in two-sample testing, distribution shift detection, and unsupervised domain adaptation, outperforming state-of-the-art techniques. RJSD's versatility and effectiveness make it a promising tool for machine learning research and applications.

The Representation Jensen-Shannon Divergence

TL;DR

This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence, and shows that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences.

Abstract

Quantifying the difference between probability distributions is crucial in machine learning. However, estimating statistical divergences from empirical samples is challenging due to unknown underlying distributions. This work proposes the representation Jensen-Shannon divergence (RJSD), a novel measure inspired by the traditional Jensen-Shannon divergence. Our approach embeds data into a reproducing kernel Hilbert space (RKHS), representing distributions through uncentered covariance operators. We then compute the Jensen-Shannon divergence between these operators, thereby establishing a proper divergence measure between probability distributions in the input space. We provide estimators based on kernel matrices and empirical covariance matrices using Fourier features. Theoretical analysis reveals that RJSD is a lower bound on the Jensen-Shannon divergence, enabling variational estimation. Additionally, we show that RJSD is a higher-order extension of the maximum mean discrepancy (MMD), providing a more sensitive measure of distributional differences. Our experimental results demonstrate RJSD's superiority in two-sample testing, distribution shift detection, and unsupervised domain adaptation, outperforming state-of-the-art techniques. RJSD's versatility and effectiveness make it a promising tool for machine learning research and applications.
Paper Structure (33 sections, 10 theorems, 39 equations, 8 figures, 4 tables)

This paper contains 33 sections, 10 theorems, 39 equations, 8 figures, 4 tables.

Key Result

Proposition 2

The empirical kernel-based representation entropy estimator of $X$ is

Figures (8)

  • Figure 1: Comparing RJSD estimators with Gaussian kernel while varying the kernel bandwidth. The first row illustrates the divergence between two Cauchy distributions ($d=1$) with Jensen-Shannon divergence (JSD) $JSD = 0.5\times \log(2)$. The second row presents the estimated divergence for two multivariate Gaussians while varying dimensionality.
  • Figure 2: Approximation effect in the entropy terms with the power-series estimator of order $p$.
  • Figure 3: Jensen-Shannon Divergence estimation for two sets of samples following Cauchy distributions (N = 512). We compare the following estimators: kernel-based RJSD, power-series RJSD-p, Fourier Features-based RJSD-FF, Neural Network-based RJSD-NN, NWJ nguyen2010estimating, infoNCE oord2018representation, CLUB cheng2020club, MINE belghazi2018mutual. The black line is the closed-form JS divergence between the Cauchy distributions. The parameters of the distributions are changed every 200 epochs to increase the divergence.
  • Figure 4: Test Power comparison for different orders of approximation. For the mixture of Gaussians and Galaxy MNIST, we deviate from the null hypothesis for a fixed number of samples of $n=m=500$. For CIFAR-10 vs 10.1, we show the boxplot of the distribution of the average test power for different training sets.
  • Figure 5: Test Power comparison different methods.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Definition 1
  • Proposition 2
  • Proposition 3
  • Definition 4
  • Lemma 5
  • Theorem 6
  • Theorem 7
  • Proposition 8
  • Proposition 9
  • Proposition 10
  • ...and 3 more