Symbolic Quantitative Information Flow for Probabilistic Programs
Philipp Schröer, Francesca Randone, Raúl Pardo, Andrzej Wąsowski
TL;DR
The paper develops symbolic methods to quantify information leakage in probabilistic programs by coupling two semantic frameworks: discrete weakest pre-expectation (WPE) and continuous Gaussian mixture semantics (SOGA). It provides exact symbolic formulas for discrete programs and principled Gaussian-mixture approximations with bound guarantees for continuous ones, enabling exact or bounded computation of entropy, conditional entropy, KL divergence, and mutual information. The approach includes sufficient conditions under which SOGA aligns with the exact semantics, and demonstrates applicability to differential privacy mechanisms, including randomized response and the Gaussian mechanism. By providing case studies, the work shows how attacker priors and privacy parameters shape information leakage and offers a scalable, semantics-driven alternative to sampling-based or model-counting methods. This contributes to robust, worst-case leakage analysis in data-intensive applications and supports precise privacy guarantees in probabilistic programming contexts.
Abstract
It is of utmost importance to ensure that modern data intensive systems do not leak sensitive information. In this paper, the authors, who met thanks to Joost-Pieter Katoen, discuss symbolic methods to compute information-theoretic measures of leakage: entropy, conditional entropy, Kullback-Leibler divergence, and mutual information. We build on two semantic frameworks for symbolic execution of probabilistic programs. For discrete programs, we use weakest pre-expectation calculus to compute exact symbolic expressions for the leakage measures. Using Second Order Gaussian Approximation (SOGA), we handle programs that combine discrete and continuous distributions. However, in the SOGA setting, we approximate the exact semantics using Gaussian mixtures and compute bounds for the measures. We demonstrate the use of our methods in two widely used mechanisms to ensure differential privacy: randomized response and the Gaussian mechanism.
