Table of Contents
Fetching ...

Generalizing Geometric Partition Entropy for the Estimation of Mutual Information in the Presence of Informative Outliers

C. Tyler Diggans, Abd AlRahman R. AlMomani

TL;DR

A generalized definition of geometric partition entropy is now provided for samples within a bounded (finite measure) region of a d-dimensional vector space that allows flexibility in the incorporation of geometry to vary the representation of outlier impact, which leads to a significant broadening in the applicability of the established entropy-based concepts.

Abstract

The recent introduction of geometric partition entropy brought a new viewpoint to non-parametric entropy quantification that incorporated the impacts of informative outliers, but its original formulation was limited to the context of a one-dimensional state space. A generalized definition of geometric partition entropy is now provided for samples within a bounded (finite measure) region of a d-dimensional vector space. The basic definition invokes the concept of a Voronoi diagram, but the computational complexity and reliability of Voronoi diagrams in high dimension make estimation by direct theoretical computation unreasonable. This leads to the development of approximation schemes that enable estimation that is faster than current methods by orders of magnitude. The partition intersection ($π$) approximation, in particular, enables direct estimates of marginal entropy in any context resulting in an efficient and versatile mutual information estimator. This new measure-based paradigm for data driven information theory allows flexibility in the incorporation of geometry to vary the representation of outlier impact, which leads to a significant broadening in the applicability of established entropy-based concepts. The incorporation of informative outliers is illustrated through analysis of transient dynamics in the synchronization of coupled chaotic dynamical systems.

Generalizing Geometric Partition Entropy for the Estimation of Mutual Information in the Presence of Informative Outliers

TL;DR

A generalized definition of geometric partition entropy is now provided for samples within a bounded (finite measure) region of a d-dimensional vector space that allows flexibility in the incorporation of geometry to vary the representation of outlier impact, which leads to a significant broadening in the applicability of the established entropy-based concepts.

Abstract

The recent introduction of geometric partition entropy brought a new viewpoint to non-parametric entropy quantification that incorporated the impacts of informative outliers, but its original formulation was limited to the context of a one-dimensional state space. A generalized definition of geometric partition entropy is now provided for samples within a bounded (finite measure) region of a d-dimensional vector space. The basic definition invokes the concept of a Voronoi diagram, but the computational complexity and reliability of Voronoi diagrams in high dimension make estimation by direct theoretical computation unreasonable. This leads to the development of approximation schemes that enable estimation that is faster than current methods by orders of magnitude. The partition intersection () approximation, in particular, enables direct estimates of marginal entropy in any context resulting in an efficient and versatile mutual information estimator. This new measure-based paradigm for data driven information theory allows flexibility in the incorporation of geometry to vary the representation of outlier impact, which leads to a significant broadening in the applicability of established entropy-based concepts. The incorporation of informative outliers is illustrated through analysis of transient dynamics in the synchronization of coupled chaotic dynamical systems.

Paper Structure

This paper contains 32 sections, 13 equations, 18 figures.

Figures (18)

  • Figure 1: (a) and (b) show two different sets of measurements of the same phenomena as recorded by devices that have larger and smaller error tolerance respectively. It is not clear based on the known parameters of the measurement devices that the central clusters should be separated into a collection of small macrostates, and in the absence of such information, we must assume that these measurements belong to the same coarse-grained macrostate. (c) illustrates the pre-processing alteration to the dataset that would be allowed in GPE, where the continuous values are binned within known tolerances, but left defined on the continuum. Ideally, in GPE for a particular course-graining, each macrostate will be equally represented by the data, but this is not always possible, e.g., computers generally operate at machine precision of $\epsilon\approx 10^{-15}$, meaning some level of discretization is always necessary before course-graining.
  • Figure 2: A histogram estimate for a probability distribution from a simple data set $(X,Y)\in\mathbb{R}^2$; the joint entropy, $H(X,Y)$, would be computed from this histogram in the joint space using Equ. \ref{['eqn:DSE']}, whereas the marginal entropies $H(X)$ and $H(Y)$ would be computed using Equ. \ref{['eqn:DSE']} on the one dimensional histograms shown on the vertical planes.
  • Figure 3: A representative illustration of how the marginal probability estimates are computed for each data point in the KSG estimator; for example when $k=2$, a square neighborhood (an open ball under the Chebyshev distance) is defined by the $k$-th nearest neighbor, and the number of data points that fall within this neighborhood in each of the marginal spaces are counted to define $n_x$ and $n_y$.
  • Figure 4: (a) Given a sample of $N=500$ unique values (within a tolerance of $\epsilon=10^{-5}$) lying within a bounded region $\mathscr{D}$, (b) the Voronoi diagram of the data set intersected with the boundary of the region $\mathscr{D}$ provides the optimal partition of $\mathscr{D}\subset\mathbb{R}^d$ for the choice of $K=N$ regions; coarse-grained estimates of GPE can be obtained for $K=32$ macrostates by fusing nearly equally represented groups obtained by (c) RPFS or (d) CKM into a partition of supercells (indicated by shared color) before computing the local specific density measures and using Equ. \ref{['eqn:PE']} to estimate the GPE.
  • Figure 5: (a) Given a sample of $N=500$ unique values (within a tolerance of $\epsilon=10^{-5}$) lying within a radially bounded region $\mathscr{D}$, (b) the Voronoi diagram of the data set intersected with the boundary of the region $\mathscr{D}$ provides the optimal partition of $\mathscr{D}\subset\mathbb{R}^d$ for the choice of $K=N$ regions. A coarse-grained estimate of GPE can be obtained for $K=32$ by fusing (near) equally represented groups into supercells (indicated by shared color) before computing the local specific density measure of these supercell macrostates; (c) and (d) show two examples of supercell partitions obtained with the RPFS and CMK algorithms respectively.
  • ...and 13 more figures

Theorems & Definitions (1)

  • Definition 1