Refuting Equivalence in Probabilistic Programs with Conditioning
Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Petr Novotný, Đorđe Žikelić
TL;DR
The paper addresses refuting equivalence of probabilistic programs that include conditioning via observe and score statements. It introduces weighted restarting to translate conditioning-enabled programs into output-equivalent conditioning-free programs by tracking weight with $W$ and restarting with probability $1 - W/M$, thereby reducing to a setting where existing automated, provably correct refutation methods apply. The authors establish a sound and complete proof rule based on upper and lower expectation supermartingales (UESM/LESM) under OST-soundness, and show semi-completeness of their polynomial-template synthesis algorithm. Empirical evaluation on standard inference benchmarks demonstrates improved refutation coverage and provides Kantorovich distance lower bounds for similarity, while highlighting the method's automation and applicability to infinite-state, discrete, and continuous PPs with conditioning.
Abstract
We consider the problem of refuting equivalence of probabilistic programs, i.e., the problem of proving that two probabilistic programs induce different output distributions. We study this problem in the context of programs with conditioning (i.e., with observe and score statements), where the output distribution is conditioned by the event that all the observe statements along a run evaluate to true, and where the probability densities of different runs may be updated via the score statements. Building on a recent work on programs without conditioning, we present a new equivalence refutation method for programs with conditioning. Our method is based on weighted restarting, a novel transformation of probabilistic programs with conditioning to the output equivalent probabilistic programs without conditioning that we introduce in this work. Our method is the first to be both a) fully automated, and b) providing provably correct answers. We demonstrate the applicability of our method on a set of programs from the probabilistic inference literature.
