Random Variables, Conditional Independence and Categories of Abstract Sample Spaces
Dario Stein
TL;DR
The paper unifies Simpson's probability-sheaf approach with Markov-category probability in a synthetic setting by constructing a category of abstract sample spaces S(C) from any suitable Markov category C. It defines probability spaces P(C) and sample spaces S(C), develops an intrinsic notion of conditional independence via independent pullbacks, and shows that probability sheaves (RE(V)) live naturally as atomic sheaves on S(C), recovering Simpson’s examples in a uniform, purely synthetic framework. The independence structure is validated through IP1–IP5, and relative products provide canonical independent pullbacks, enabling a robust theory of random elements and laws within a topos of atomic sheaves. The work yields concrete instantiations in FinStoch, BorelStoch, SetMulti, Gauss, and StrongName, and connects to nominal techniques through Gauss and Nom example categories, suggesting broad applicability and future research directions in probabilistic separation logic and logical foundations. The synthesis paves the way for probability spaces, sample spaces, and random variables to be treated cohesively in a high-level categorical setting with synthetic Bayesian inversion and a shared independence calculus.
Abstract
Two high-level "pictures" of probability theory have emerged: one that takes as central the notion of random variable, and one that focuses on distributions and probability channels (Markov kernels). While the channel-based picture has been successfully axiomatized, and widely generalized, using the notion of Markov category, the categorical semantics of the random variable picture remain less clear. Simpson's probability sheaves are a recent approach, in which probabilistic concepts like random variables are allowed vary over a site of sample spaces. Simpson has identified rich structure on these sites, most notably an abstract notion of conditional independence, and given examples ranging from probability over databases to nominal sets. We aim bring this development together with the generality and abstraction of Markov categories: We show that for any suitable Markov category, a category of sample spaces can be defined which satisfies Simpson's axioms, and that a theory of probability sheaves can be developed purely synthetically in this setting. We recover Simpson's examples in a uniform fashion from well-known Markov categories, and consider further generalizations.
