Nearest-Neighbours Estimators for Conditional Mutual Information
Jake Witter, Conor Houghton
TL;DR
The paper tackles the data-hungry nature of conditional mutual information estimation by introducing a metric-space, Kozachenko-Leonenko–style nearest-neighbor estimator for $I(X,Y|Z)$ that relies on local volume counts rather than coordinate-based densities. A bias-correction term $I_b(h)$ is derived via a hypergeometric model, and the estimator is optimized over the smoothing parameter $h$ to balance bias and variance. Compared to the KSG estimator, the new method is coordinate-free and provides a practical bias-correction framework, demonstrated through simulations on a simple Markov tree and a transfer-entropy–focused XY-model, where it achieves closer-to-ground-truth estimates with far less data. The approach broadens applicability to high-dimensional or non-Euclidean data, offering a scalable, model-free tool for information-theoretic analysis and causal inference in data science and beyond.
Abstract
The conditional mutual information quantifies the conditional dependence of two random variables. It has numerous applications; it forms, for example, part of the definition of transfer entropy, a common measure of the causal relationship between time series. It does, however, require a lot of data to estimate accurately and suffers the curse of dimensionality, limiting its application in machine learning and data science. However, the Kozachenko-Leonenko approach can address this problem: it is possible, in this approach to define a nearest-neighbour estimator which depends only on the distance between data points and not on the dimension of the data. Furthermore, the bias can be calculated analytically for this estimator. Here this estimator is described and is tested on simulated data.
