INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators
Alberto Foresti, Giulio Franzese, Pietro Michiardi
TL;DR
The paper tackles the challenge of estimating information-theoretic quantities for high-dimensional discrete distributions by leveraging a continuous-time Markov chain framework with score-based learning. It uses forward and reverse CTMC dynamics and Dynkin's formula to derive a KL-divergence estimator, replacing intractable likelihood ratios with trainable score networks, enabling a single model to estimate mutual information and entropy. A key innovation is the sparsified, multi-component decomposition of the state space together with an absorbing-state trick that makes marginals derivable from the joint score, yielding scalable estimators. Empirical results on synthetic benchmarks show robustness and superiority over neural embedding-based estimators, and entropy estimation on Ising spin glasses demonstrates practical applicability to complex discrete systems. This approach promises scalable, accurate information-theoretic analysis in domains where discrete high-dimensional dependencies matter.
Abstract
Information-theoretic quantities play a crucial role in understanding non-linear relationships between random variables and are widely used across scientific disciplines. However, estimating these quantities remains an open problem, particularly in the case of high-dimensional discrete distributions. Current approaches typically rely on embedding discrete data into a continuous space and applying neural estimators originally designed for continuous distributions, a process that may not fully capture the discrete nature of the underlying data. We consider Continuous-Time Markov Chains (CTMCs), stochastic processes on discrete state-spaces which have gained popularity due to their generative modeling applications. In this work, we introduce INFO-SEDD, a novel method for estimating information-theoretic quantities of discrete data, including mutual information and entropy. Our approach requires the training of a single parametric model, offering significant computational and memory advantages. Additionally, it seamlessly integrates with pretrained networks, allowing for efficient reuse of pretrained generative models. To evaluate our approach, we construct a challenging synthetic benchmark. Our experiments demonstrate that INFO-SEDD is robust and outperforms neural competitors that rely on embedding techniques. Moreover, we validate our method on a real-world task: estimating the entropy of an Ising model. Overall, INFO-SEDD outperforms competing methods and shows scalability to high-dimensional scenarios, paving the way for new applications where estimating MI between discrete distribution is the focus. The promising results in this complex, high-dimensional scenario highlight INFO-SEDD as a powerful new estimator in the toolkit for information-theoretical analysis.
