Table of Contents
Fetching ...

Information Theory Measures via Multidimensional Gaussianization

Valero Laparra, J. Emmanuel Johnson, Gustau Camps-Valls, Raul Santos-Rodríguez, Jesus Malo

TL;DR

This paper proposes an indirect way of computing information based on a multivariate Gaussianization transform that mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations which can be interpreted as a particular deep neural network.

Abstract

Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations, which can be interpreted as a particular deep neural network. We introduce specific Gaussianization-based methodologies to estimate total correlation, entropy, mutual information and Kullback-Leibler divergence. We compare them to recent estimators showing the accuracy on synthetic data generated from different multivariate distributions. We made the tools and datasets publicly available to provide a test-bed to analyze future methodologies. Results show that our proposal is superior to previous estimators particularly in high-dimensional scenarios; and that it leads to interesting insights in neuroscience, geoscience, computer vision, and machine learning.

Information Theory Measures via Multidimensional Gaussianization

TL;DR

This paper proposes an indirect way of computing information based on a multivariate Gaussianization transform that mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations which can be interpreted as a particular deep neural network.

Abstract

Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations, which can be interpreted as a particular deep neural network. We introduce specific Gaussianization-based methodologies to estimate total correlation, entropy, mutual information and Kullback-Leibler divergence. We compare them to recent estimators showing the accuracy on synthetic data generated from different multivariate distributions. We made the tools and datasets publicly available to provide a test-bed to analyze future methodologies. Results show that our proposal is superior to previous estimators particularly in high-dimensional scenarios; and that it leads to interesting insights in neuroscience, geoscience, computer vision, and machine learning.

Paper Structure

This paper contains 40 sections, 33 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Conceptual scheme of information theoretic measures. ${\mathbf x} = [x_1,x_2]$ and ${\mathbf y}=[y_1,y_2]$ are two-dimensional random variables. Areas represent amounts of information, and intersections represent shared information among the corresponding variables and dimensions. Examples of entropy, total correlation and mutual information are given.
  • Figure 2: Gaussianization of an arbitrary non-Gaussian dataset using the Rotation-Based Iterative Gaussianization (RBIG) Laparra2011b. This Gaussianization algorithm is a cascade of nonlinear+linear transforms: the marginal Gaussianizations $\psi^{(n)}$ and the rotations ${\mathbf R}^{(n)}$. The dots are colored and a blue line passing through the dataset has been added in order to follow how the dataset is modified in each layer of the network. In the first layer box we explicitly show the specific marginal Gaussianization transforms $\psi^{(0)}_i$ for the each dimension $i$ and the specific rotation ${\mathbf R}^{(0)}_i$ directions for the first iteration. Although the toy example is only bidimensional, in the bottom a representation of the RBIG network for generic dimensionality datasets is presented. See Section \ref{['sec:RBIG']} for details.
  • Figure 3: Total correlation estimation results in relative mean absolute error. Results for different distributions are given: Gaussian, uniform and the Student PDFs ($\nu = 3,5,20$ for each row respectively). Each column correspond to an experiment of a particular number of dimensions $D$. Mean and standard deviation are given for five trials.
  • Figure 4: Efficient Coding Hypothesis in Visual Neuroscience from RBIG estimations of total correlation: redundancy reduction in the human visual system. Left panel shows the PDF of natural images (VanHateren database VanHateren98) at the luminance/contrast plane. Surfaces at the top-right show the redundancy reduction $\Delta T$ along the considered biological network (see Carandini12Martinez18GomezVilla19 for background on these networks) at different points of the image space. Estimations are done with RBIG (center) and with the theoretical reference computed with the analytical Jacobian of the network Martinez18 via Eq. \ref{['deltaT']} (right). This represents the efficiency of the visual brain in transmitting information about natural images. The surfaces at the bottom display the uncertainty of the RBIG and reference estimates computed from 10 realizations.
  • Figure 5: Regularities in geoscience data from RBIG estimations of entropy. Left: mean global temperature at each time step. Center: Spatial entropy at each time step (see text for details). Right: Evolution of the relation between temperature and spatial entropy for 2001-2008. The mean for all the studied years is given in the thicker line which colors correspond to the distance between the Earth and the sun for this period of time (blue closer, yellow farther away).
  • ...and 10 more figures