Table of Contents
Fetching ...

Causal models in string diagrams

Robin Lorenz, Sean Tull

TL;DR

The paper develops a diagrammatic reformulation of causal models within cd-categories, mapping directed acyclic graphs $G$ to network diagrams and treating mechanisms as channels. It shows that causal interventions, conditioning via a normalisation box, open causal models, and counterfactuals can all be handled diagrammatically, and extends the framework to latent-variable ADMGs through rootification. By unifying structural causal models and causal Bayesian networks under a single graphical calculus, the work clarifies identifiability results and provides a pedagogical, compositional language for causal reasoning with broad potential applications in ML and quantum causality. This diagrammatic perspective offers a foundational and intuitive toolkit for causal reasoning that scales across disciplines and computational paradigms.

Abstract

The framework of causal models provides a principled approach to causal reasoning, applied today across many scientific domains. Here we present this framework in the language of string diagrams, interpreted formally using category theory. A class of string diagrams, called network diagrams, are in 1-to-1 correspondence with directed acyclic graphs. A causal model is given by such a diagram with its components interpreted as stochastic maps, functions, or general channels in a symmetric monoidal category with a 'copy-discard' structure (cd-category), turning a model into a single mathematical object that can be reasoned with intuitively and yet rigorously. Building on prior works by Fong and Jacobs, Kissinger and Zanasi, as well as Fritz and Klingler, we present diagrammatic definitions of causal models and functional causal models in a cd-category, generalising causal Bayesian networks and structural causal models, respectively. We formalise general interventions on a model, including but beyond do-interventions, and present the natural notion of an open causal model with inputs. We also give an approach to conditioning based on a normalisation box, allowing for causal inference calculations to be done fully diagrammatically. We define counterfactuals in this setup, and treat the problems of the identifiability of causal effects and counterfactuals fully diagrammatically. The benefits of such a presentation of causal models lie in foundational questions in causal reasoning and in their clarificatory role and pedagogical value. This work aims to be accessible to different communities, from causal model practitioners to researchers in applied category theory, and discusses many examples from the literature for illustration. Overall, we argue and demonstrate that causal reasoning according to the causal model framework is most naturally and intuitively done as diagrammatic reasoning.

Causal models in string diagrams

TL;DR

The paper develops a diagrammatic reformulation of causal models within cd-categories, mapping directed acyclic graphs to network diagrams and treating mechanisms as channels. It shows that causal interventions, conditioning via a normalisation box, open causal models, and counterfactuals can all be handled diagrammatically, and extends the framework to latent-variable ADMGs through rootification. By unifying structural causal models and causal Bayesian networks under a single graphical calculus, the work clarifies identifiability results and provides a pedagogical, compositional language for causal reasoning with broad potential applications in ML and quantum causality. This diagrammatic perspective offers a foundational and intuitive toolkit for causal reasoning that scales across disciplines and computational paradigms.

Abstract

The framework of causal models provides a principled approach to causal reasoning, applied today across many scientific domains. Here we present this framework in the language of string diagrams, interpreted formally using category theory. A class of string diagrams, called network diagrams, are in 1-to-1 correspondence with directed acyclic graphs. A causal model is given by such a diagram with its components interpreted as stochastic maps, functions, or general channels in a symmetric monoidal category with a 'copy-discard' structure (cd-category), turning a model into a single mathematical object that can be reasoned with intuitively and yet rigorously. Building on prior works by Fong and Jacobs, Kissinger and Zanasi, as well as Fritz and Klingler, we present diagrammatic definitions of causal models and functional causal models in a cd-category, generalising causal Bayesian networks and structural causal models, respectively. We formalise general interventions on a model, including but beyond do-interventions, and present the natural notion of an open causal model with inputs. We also give an approach to conditioning based on a normalisation box, allowing for causal inference calculations to be done fully diagrammatically. We define counterfactuals in this setup, and treat the problems of the identifiability of causal effects and counterfactuals fully diagrammatically. The benefits of such a presentation of causal models lie in foundational questions in causal reasoning and in their clarificatory role and pedagogical value. This work aims to be accessible to different communities, from causal model practitioners to researchers in applied category theory, and discusses many examples from the literature for illustration. Overall, we argue and demonstrate that causal reasoning according to the causal model framework is most naturally and intuitively done as diagrammatic reasoning.
Paper Structure (7 sections, 1 figure)

This paper contains 7 sections, 1 figure.

Figures (1)

  • Figure 1: DAG $G$ in (a) with the encircling of vertices indicating the subset of 'observed' variables; in (b) the corresponding network diagram with $\circ$ representing a copy map; in (c) an equivalent diagram with the wire of $B$ 'marginalised over' by discarding it, represented by ${ {\space \space = \space Now the data that is really required to specify an intervention is often not a full causal model, but only a subset of its mechanisms. This naturally leads one to consider the notion of an \emph{open causal model}, which is essentially a 'model with inputs', which we introduce in Section~\ref{['sec:openCMs']}. An example is the P(Y|\mathop{\mathrm{do}}\nolimits(X)) when understood as the map that maps x to P(Y|\mathop{\mathrm{do}}\nolimits(X=x)). For the above example this allows one to consider the following: \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (20) at (0, 0) {$P(T,L ; \Do(S) )$}; \node [style=none] (21) at (1, 0.5) {}; \node [style=none] (22) at (1, 1.75) {}; \node [style=none] (23) at (-1, 0.5) {}; \node [style=none] (24) at (-1, 1.75) {}; \node [style=label] (27) at (-0.5, 1.5) {$T$}; \node [style=label] (28) at (1.5, 1.5) {$L$}; \node [style=none] (31) at (0, -2) {}; \node [style=none] (32) at (0, -0.25) {}; \node [style=label] (34) at (0.5, -1.75) {$S$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (22.center) to (21.center); \draw (24.center) to (23.center); \draw [in=90, out=-90] (32.center) to (31.center); \end{pgfonlayer} \end{tikzpicture} \space = \space \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (1) at (0, -2) {$c_S$}; \node [style=whitedot] (2) at (1, -3.25) {}; \node [style=none] (3) at (2.25, -2.25) {}; \node [style=none] (4) at (0, -2.25) {}; \node [style=none] (5) at (1, -4.25) {}; \node [style=none] (6) at (1.5, 4) {}; \node [style=map] (7) at (1, -5) {$c_B$}; \node [style=none] (8) at (0, -1) {}; \node [style=label] (10) at (1.5, -4) {$B$}; \node [style=whitedot] (12) at (0.25, 3) {}; \node [style=none] (13) at (1, 4) {}; \node [style=none] (14) at (-0.75, 4) {}; \node [style=none] (15) at (0.25, 1.5) {}; \node [style=whitedot] (16) at (-0.75, 0.5) {}; \node [style=none] (17) at (0.25, 1.5) {}; \node [style=none] (18) at (-2, 1.5) {}; \node [style=map] (19) at (0.25, 1.75) {$c_T$}; \node [style=map] (20) at (1.25, 4.25) {$c_L$}; \node [style=none] (21) at (1.25, 4.25) {}; \node [style=none] (22) at (1.25, 5.5) {}; \node [style=none] (23) at (-0.75, 4) {}; \node [style=none] (24) at (-0.75, 5.5) {}; \node [style=none] (25) at (-2, 1.5) {}; \node [style=none] (26) at (-2, 3.5) {}; \node [style=label] (27) at (-0.25, 5.25) {$T$}; \node [style=label] (28) at (1.75, 5.25) {$L$}; \node [style=upground] (30) at (-2, 3.65) {}; \node [style=none] (31) at (-1.75, -6.5) {}; \node [style=none] (32) at (-0.75, 0.5) {}; \node [style=upground] (33) at (0, -0.85) {}; \node [style=label] (34) at (-1.25, -6.25) {$S$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2) to (5.center); \draw [bend right=45] (2) to (3.center); \draw [bend right=45] (4.center) to (2); \draw [in=90, out=-90, looseness=1.50] (6.center) to (3.center); \draw (5.center) to (7); \draw (8.center) to (4.center); \draw (12) to (15.center); \draw [in=-90, out=15, looseness=1.25] (12) to (13.center); \draw [bend right=45] (14.center) to (12); \draw [bend right=45] (16) to (17.center); \draw [bend right=45] (18.center) to (16); \draw (22.center) to (21.center); \draw (24.center) to (23.center); \draw (26.center) to (25.center); \draw [in=90, out=-90] (32.center) to (31.center); \end{pgfonlayer} \end{tikzpicture} \space = \space \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=none] (5) at (1.25, 1) {}; \node [style=map] (7) at (1.75, -1.5) {$c_B$}; \node [style=whitedot] (12) at (0, 0) {}; \node [style=none] (13) at (0.75, 1) {}; \node [style=none] (14) at (-1, 1) {}; \node [style=none] (15) at (0, -1.5) {}; \node [style=map] (19) at (0, -1.25) {$c_T$}; \node [style=map] (20) at (1, 1.25) {$c_L$}; \node [style=none] (21) at (1, 1.25) {}; \node [style=none] (22) at (1, 2.5) {}; \node [style=none] (23) at (-1, 1) {}; \node [style=none] (24) at (-1, 2.5) {}; \node [style=label] (27) at (-0.5, 2.25) {$T$}; \node [style=label] (28) at (1.5, 2.25) {$L$}; \node [style=none] (31) at (0, -3.25) {}; \node [style=none] (32) at (0, -1.5) {}; \node [style=label] (34) at (0.5, -3) {$S$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [in=90, out=-90, looseness=1.25] (5.center) to (7); \draw (12) to (15.center); \draw [in=-90, out=15, looseness=1.25] (12) to (13.center); \draw [bend right=45] (14.center) to (12); \draw (22.center) to (21.center); \draw (24.center) to (23.center); \draw [in=90, out=-90] (32.center) to (31.center); \end{pgfonlayer} \end{tikzpicture} The notion of an open model also allows one to formalise the composition of causal models, in parallel and sequentially, as well as transformations between models more general than interventions. This is made formal in Section \ref{['Sec_ComposingOpenModels']}, where we show that open causal models form a symmetric monoidal category themselves. Open models are built \textcolor{black}{on ideas related} to the \emph{conditional DAGs} from Ref.~RichardsonEtAl_2022_NestedMarkovForADMGs and \emph{open graphs} from Ref.~OpenGraph1OpenGraph2. The remainder of the article turns to applications of the framework. In such applications, one is not always in possession of 'full' knowledge of the causal structure but at times only knowledge that some variables may share a latent common cause. Such scenarios are depicted with an \emph{acyclic directed mixed graph} (ADMG), a DAG which may also have bi-directed edges representing latent causal structure. In Section~\ref{['Sec_TreatmentLatentvariables']} we discuss ADMGs and the notion of a latent projection from a DAG to an ADMG. We then introduce the \emph{rootification} of an ADMG G, a principled way to map it to a DAG such that it has G as its latent projection, through introducing root nodes. The ADMG in Fig.~\ref{['Fig_Intro_Example_ADMG']} and the DAG in Fig.~\ref{['Fig_Intro_example_DAG']} stand to each other in such a relationship. The network diagrams of rootified ADMGs are the terms in which identifiability problems can then be discussed and, importantly, without loss of generality. A major example of an identifiability problem is the problem of \emph{identifiability of causal effects}: when and how we can uniquely determine the distribution resulting from an intervention simply from the observational data and the assumed causal structure, specified by an ADMG. Section~\ref{['Sec_CE_Identifiability']} briefly recaps the problem and then explores its treatment diagrammatically. We first restate a result by Jacobs \textit{et al.}\ from Ref.~JacobsEtAl_2021_CausalInferenceByDiagramSurgery that casts the c-component condition by Tian and Pearl TianEtAl_2002_GeneralIdentificationConditionForCausalEffects, at least for a special case of when that condition is applicable, in diagrammatic terms. Here our setup allows this result to be extended in the obvious way to interventions more general than atomic ones. Second, we discuss several pedagogical examples from the literature, showing how easy these concrete examples become when represented in string diagrammatic language. We will also pay attention to how the diagrammatic normalisation box makes explicit the interplay between when one would have to condition on events with vanishing probability and concerns of identifiability. A final central aspect to the causal model framework are \emph{counterfactuals}, questions such as \textit{would Mary have had a headache, had she taken an aspirin?} In Section~\ref{['Sec_CF_TheNotion']} we give a general formal definition of counterfactuals in diagrammatic terms, which allows for arbitrarily many parallel worlds in the counterfactual statement. While it is in keeping with the literature (see, e.g., Refs.~Pearl_CausalityShpitserEtAl_2008_CompleteIdentificationMethodCausalHierarchyPearl_2011_AlgorithmizationOfCounterfactuals), we argue that the diagrammatic presentation and our precise conditions on data that defines a counterfactual have a clarificatory value. In particular it makes clear the difference between counterfactual questions and 'what if one were to intervene' kind of questions. We find that it is easy and natural to treat counterfactuals diagrammatically, as demonstrated with an example in Sec.~\ref{['Sec_CF_id_example']}. This leads to the statement of an algorithm, \texttt{simplify-cf}, in Sec.~\ref{['Sec_CF_makeCG_algorithm_1']}, which simplifies the diagram of a counterfactual by applying rewrite rules \textcolor{black}{in a way that makes its output correspond to that of the make-cg\ algorithm from Ref.~}ShpitserEtAl_2008_CompleteIdentificationMethodCausalHierarchy\textcolor{black}{, but is arguably much more straightforward. Sec.~}\ref{['Sec_CF_id_criteria']}\textcolor{black}{ then presents an algorithm, id-cf, which on the basis of simplify-cf's output assesses the identifiability of a counterfactual and outputs the corresponding identifying expression in diagrams. This can be seen as the translation of the main ideas of the algorithmic solution from Ref.~}ShpitserEtAl_2008_CompleteIdentificationMethodCausalHierarchy\textcolor{black}{ into diagrammatic terms. We argue the algorithm to be sound and discuss completeness.} Finally, Sec.~\ref{['Sec_CF_Further_Generalisation']} touches on a generalisation of the notion of counterfactuals that becomes natural in this work's setup, namely a notion where worlds may be defined by changes in causal structure more general than through do-interventions and where one may condition upon fuzzy facts. Section~\ref{['Sec_Conclusions']} closes with a list of promising directions for future work. % % Uncomment while working on file standalone, if needed % We will begin by introducing the category-theoretic approach to probability theory developed by numerous \textcolor{black}{authors, see, e.g., Refs.~}CoeckeEtAl_2012_PicturingBayesianInferenceChoEtAl_2019_DisintegrationViaStringDiagramsfritz2020synthetic\textcolor{black}{.} Formally, this involves working in a 'symmetric monoidal category' or more specifically a 'cd-category'. In practice this amounts to working with intuitive but formal diagrams known as \emph{string diagrams} selinger2011survey, which describe probabilistic and (here causal) processes. For the purposes of this article it suffices to consider the category \mathbf{Mat}_{\mathbb{R}^+} of \mathbb{R}^+-valued finite matrices, described in Example \ref{['ex:MatR+']} below, but the categorical approach is much more general. \textcolor{black}{Let us start then with the basic notion of symmetric monoidal categories, as well as the string diagrammatic language used for their representation.} Recall that a \emph{category} \mathbf{C} consists of a collection of objects X, Y, \dots and morphisms f \colon X \to Y between them, which we can compose in sequence. Such a morphism is also referred to as a \emph{process} f from X to Y. An object X is depicted as a wire labelled by X, and a morphism f \colon X\to Y as a box with lower input X and upper output Y, read from bottom to top. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (0, 0) {$f$}; \node [style=none] (1) at (0, 1) {}; \node [style=none] (2) at (0, -1) {}; \node [style=label] (3) at (0, 1.5) {$Y$}; \node [style=label] (4) at (0, -1.5) {$X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \end{pgfonlayer} \end{tikzpicture} Given another morphism g \colon Y \to Z we can form their sequential composite g \circ f \colon X \to Z, depicted as follows. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (3.5, -1) {}; \node [style=none] (1) at (3.5, 1) {}; \node [style=map] (2) at (3.5, 0) {$g \circ f$}; \node [style=none] (3) at (5.25, 0) {$=$}; \node [style=none] (4) at (6.75, -2) {}; \node [style=none] (5) at (6.75, 2) {}; \node [style=label] (6) at (3.5, -1.5) {$X$}; \node [style=label] (7) at (6.75, -2.5) {$X$}; \node [style=label] (8) at (6.75, 2.25) {$Z$}; \node [style=label] (9) at (3.5, 1.25) {$Z$}; \node [style=map] (10) at (6.75, -1) {$f$}; \node [style=map] (11) at (6.75, 1) {$g$}; \node [style=label] (12) at (7.25, 0) {$Y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0.center); \draw (5.center) to (4.center); \end{pgfonlayer} \end{tikzpicture} Note that we may only compose morphisms sequentially when their types match in this way. Each object X in a category also comes with an \emph{identity} morphism \mathrm{id}_{X} \colon X \to X depicted as a blank wire: \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (0.5, -1) {}; \node [style=none] (1) at (0.5, 1) {}; \node [style=map] (2) at (0.5, 0) {$\id{X}$}; \node [style=none] (3) at (2, 0) {$=$}; \node [style=none] (4) at (3.25, -1) {}; \node [style=none] (5) at (3.25, 1) {}; \node [style=label] (6) at (0.5, -1.5) {$X$}; \node [style=label] (7) at (3.25, -1.5) {$X$}; \node [style=label] (8) at (3.25, 1.25) {$X$}; \node [style=label] (9) at (0.5, 1.25) {$X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0.center); \draw (5.center) to (4.center); \end{pgfonlayer} \end{tikzpicture} which acts as a unit for composition: \mathrm{id}_{Y} \circ f = f = f \circ \mathrm{id}_{X} for any f \colon X \to Y. Note that this rule is trivial in the graphical language. Formally, a \emph{symmetric monoidal category} (\mathbf{C}, \otimes, I), is a category \mathbf{C} coming with a functor \otimes \colon \mathbf{C} \times \mathbf{C} \to \mathbf{C}, distinguished object I and natural transformations which express that \otimes is suitably associative and symmetric, with I as a unit coecke2006introducing. These features are most naturally expressed in string diagrams, as follows. Firstly, for any pair of objects X, Y we can form their parallel composite or 'tensor' X \otimes Y, depicted by placing their wires side-by-side. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (0, -1) {}; \node [style=none] (1) at (0, 1) {}; \node [style=label] (2) at (0, 1.5) {$X \otimes Y$}; \node [style=label] (3) at (0, -1.5) {$X \otimes Y$}; \node [style=none] (4) at (2, 0) {$=$}; \node [style=none] (5) at (4, -1) {}; \node [style=none] (6) at (4, 1) {}; \node [style=label] (7) at (4, 1.5) {$X$}; \node [style=label] (8) at (4, -1.5) {$X$}; \node [style=none] (9) at (6, -1) {}; \node [style=none] (10) at (6, 1) {}; \node [style=label] (11) at (6, 1.5) {$Y$}; \node [style=label] (12) at (6, -1.5) {$Y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0.center); \draw (6.center) to (5.center); \draw (10.center) to (9.center); \end{pgfonlayer} \end{tikzpicture} Similarly, given morphisms f \colon X \to W and g \colon Y \to Z we can form their parallel composite f \otimes g \colon X \otimes Y \to W \otimes Z, depicted as below. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (11.75, -1) {}; \node [style=none] (1) at (11.75, 1) {}; \node [style=map] (2) at (11.75, 0) {$f \otimes g$}; \node [style=label] (3) at (11.75, -1.5) {$X \otimes Y$}; \node [style=label] (4) at (11.75, 1.5) {$W \otimes Z$}; \node [style=none] (5) at (13.5, 0) {$=$}; \node [style=none] (6) at (15, -1) {}; \node [style=none] (7) at (15, 1) {}; \node [style=map] (8) at (15, 0) {$f$}; \node [style=label] (9) at (15, -1.5) {$X$}; \node [style=label] (10) at (15, 1.25) {$W$}; \node [style=none] (11) at (16.5, -1) {}; \node [style=none] (12) at (16.5, 1) {}; \node [style=map] (13) at (16.5, 0) {$g$}; \node [style=label] (14) at (16.5, -1.5) {$Y$}; \node [style=label] (15) at (16.5, 1.25) {$Z$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0.center); \draw (7.center) to (6.center); \draw (12.center) to (11.center); \end{pgfonlayer} \end{tikzpicture} The tensor is symmetric meaning that we can also 'swap' pairs of wires past each other, such that swapping twice leaves the wires alone, and boxes can move along the swaps as below. \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (0.5, 0.5) {$f$}; \node [style=map] (1) at (2, 0.5) {$g$}; \node [style=none] (2) at (0.5, 1.5) {}; \node [style=none] (3) at (2, 1.5) {}; \node [style=none] (4) at (0.5, 0) {}; \node [style=none] (5) at (2, 0) {}; \node [style=none] (6) at (2, -1.5) {}; \node [style=none] (7) at (2, 0) {}; \node [style=none] (8) at (0.5, -1.5) {}; \node [style=label] (9) at (2, -2) {$X$}; \node [style=label] (10) at (2, 2) {$Z$}; \node [style=label] (11) at (0.5, 2) {$W$}; \node [style=label] (12) at (0.5, -2) {$Y$}; \node [style=none] (13) at (3.5, 0) {$=$}; \node [style=map] (14) at (6.25, -0.75) {$f$}; \node [style=map] (15) at (4.75, -0.75) {$g$}; \node [style=none] (16) at (6.25, -1.5) {}; \node [style=none] (17) at (4.75, -1.5) {}; \node [style=none] (18) at (6.25, 0) {}; \node [style=none] (19) at (4.75, 0) {}; \node [style=none] (20) at (4.75, 1.5) {}; \node [style=none] (21) at (4.75, 0) {}; \node [style=none] (22) at (6.25, 1.5) {}; \node [style=label] (23) at (4.75, 2) {$W$}; \node [style=label] (24) at (4.75, -2) {$Y$}; \node [style=label] (25) at (6.25, -2) {$X$}; \node [style=label] (26) at (6.25, 2) {$Z$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (4.center) to (2.center); \draw (5.center) to (3.center); \draw [in=90, out=-90] (4.center) to (6.center); \draw [in=90, out=-90] (7.center) to (8.center); \draw (18.center) to (16.center); \draw (19.center) to (17.center); \draw [in=-90, out=90] (18.center) to (20.center); \draw [in=-90, out=90] (21.center) to (22.center); \end{pgfonlayer} \end{tikzpicture} A monoidal category also comes with a distinguished \emph{unit object} I, with (the identity on) I depicted simply as empty space. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (1) at (0, 1) {}; \node [style=none] (2) at (0, -1) {}; \node [style=label] (3) at (0, 1.5) {$I$}; \node [style=label] (4) at (0, -1.5) {$I$}; \node [style=none] (5) at (2, 0) {$=$}; \node [style=none] (6) at (4, 1) {}; \node [style=none] (7) at (6, 1) {}; \node [style=none] (8) at (4, -1) {}; \node [style=none] (9) at (6, -1) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw [style=dashed] (8.center) to (9.center); \draw [style=dashed] (9.center) to (7.center); \draw [style=dashed] (7.center) to (6.center); \draw [style=dashed] (6.center) to (8.center); \end{pgfonlayer} \end{tikzpicture} \ \ \quad = \ \ \quad 1 Intuitively, tensoring any object by I simply leaves it invariant. Formally this is expressed via coherence isomorphisms X \otimes I \simeq X \simeq I \otimes X. The unit object allows us to give meaning to morphisms without inputs and/or outputs. A morphism \omega \colon I \to X is called a \emph{state} of X and is depicted with 'no input'. Similarly an \emph{effect} is a morphism of the form e \colon X \to I, and is depicted with no output. \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (0, 0) {$\omega$}; \node [style=none] (1) at (0, 1) {}; \node [style=label] (2) at (0, 1.5) {$X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0); \end{pgfonlayer} \end{tikzpicture} \qquad \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (0, 1) {$e$}; \node [style=none] (1) at (0, 0) {}; \node [style=label] (2) at (0, -0.5) {$X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0); \end{pgfonlayer} \end{tikzpicture} A morphism r \colon I \to I is called a \emph{scalar} and is depicted with no inputs or outputs. Scalars can move 'freely' around diagrams, and also can be multiplied together via r \cdot s = r \otimes s = r \circ s = s \cdot r, i.e.: \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=scalar] (0) at (-0.25, 0) {$r$}; \node [style=scalar] (1) at (1, 0) {$s$}; \node [style=none] (2) at (2.25, 0) {$=$}; \node [style=scalar] (3) at (4, 0) {$r \cdot s$}; \end{pgfonlayer} \end{tikzpicture} We denote the 'empty space' scalar in \ref{['eq:empty-space']} by 1 = \mathrm{id}_{I}, satisfying 1 \circ r = r for all scalars r. The composition operations \circ, \otimes in a (monoidal) category satisfy numerous axioms which we omit here but which are self-evident in the graphical language. For example, in any category associativity of composition (h \circ g) \circ f = h \circ g \circ f = h \circ (g \circ f) is automatic in diagrams, as below. \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (1, 1) {$h \circ g$}; \node [style=map] (1) at (1, -1) {$f$}; \node [style=none] (2) at (1, -2.5) {}; \node [style=none] (3) at (1, 2.5) {}; \node [style=label] (4) at (1, 3) {$Z$}; \node [style=label] (5) at (1, -3) {$W$}; \node [style=none] (6) at (3, 0) {$=$}; \node [style=map] (7) at (5.5, 0) {$g$}; \node [style=map] (8) at (5.5, -2) {$f$}; \node [style=none] (9) at (5.5, -3.5) {}; \node [style=none] (10) at (5.5, 3.5) {}; \node [style=label] (11) at (5.5, 4) {$Z$}; \node [style=label] (12) at (5.5, -4) {$W$}; \node [style=map] (13) at (5.5, 2) {$h$}; \node [style=label] (14) at (5, 1) {$Y$}; \node [style=label] (15) at (5, -1) {$X$}; \node [style=none] (16) at (7.5, 0) {$=$}; \node [style=map] (17) at (10, 1) {$h$}; \node [style=map] (18) at (10, -1) {$g \circ f$}; \node [style=none] (19) at (10, -2.5) {}; \node [style=none] (20) at (10, 2.5) {}; \node [style=label] (21) at (10, 3) {$Z$}; \node [style=label] (22) at (10, -3) {$W$}; \node [style=label] (23) at (0, 0) {$X$}; \node [style=label] (24) at (11, 0) {$Y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (3.center) to (2.center); \draw (10.center) to (9.center); \draw (20.center) to (19.center); \end{pgfonlayer} \end{tikzpicture} Functoriality of \otimes means that (f \otimes g) \circ (f' \otimes g') = (f \circ f') \otimes (g \circ g') which is also automatic from the diagrams (left below). A consequence is the 'interchange law' (f \otimes \mathrm{id}_{}) = f \otimes g = (\mathrm{id}_{} \otimes g) which lets us freely slide boxes along wires (right below). \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (-1, 1) {$f$}; \node [style=none] (1) at (-1, -2) {}; \node [style=none] (2) at (-1, 2) {}; \node [style=map] (5) at (1, -1) {$g'$}; \node [style=none] (6) at (1, -2) {}; \node [style=none] (7) at (1, 2) {}; \node [style=map] (10) at (-1, -1) {$f'$}; \node [style=map] (11) at (1, 1) {$g$}; \node [style=none] (12) at (-2, 0) {}; \node [style=none] (13) at (2, 0) {}; \node [style=none] (14) at (3.5, 0) {$=$}; \node [style=map] (15) at (5, 1) {$f$}; \node [style=none] (16) at (5, -2) {}; \node [style=none] (17) at (5, 2) {}; \node [style=map] (20) at (7, -1) {$g'$}; \node [style=none] (21) at (7, -2) {}; \node [style=none] (22) at (7, 2) {}; \node [style=map] (25) at (5, -1) {$f'$}; \node [style=map] (26) at (7, 1) {$g$}; \node [style=none] (27) at (6, 2) {}; \node [style=none] (28) at (6, -2) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw (7.center) to (6.center); \draw [style=dashed] (12.center) to (13.center); \draw (17.center) to (16.center); \draw (22.center) to (21.center); \draw [style=dashed] (27.center) to (28.center); \end{pgfonlayer} \end{tikzpicture} \qquad \qquad \qquad \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (1) at (-5.5, 0.5) {$f$}; \node [style=none] (2) at (-5.5, -1.75) {}; \node [style=none] (3) at (-5.5, 1.75) {}; \node [style=map] (26) at (-4, -0.5) {$g$}; \node [style=none] (27) at (-4, -1.75) {}; \node [style=none] (28) at (-4, 1.75) {}; \node [style=none] (31) at (-2.25, 0) {$=$}; \node [style=map] (32) at (-0.5, 0) {$f$}; \node [style=none] (33) at (-0.5, -1.75) {}; \node [style=none] (34) at (-0.5, 1.75) {}; \node [style=map] (37) at (1, 0) {$g$}; \node [style=none] (38) at (1, -1.75) {}; \node [style=none] (39) at (1, 1.75) {}; \node [style=none] (42) at (2.25, 0) {$=$}; \node [style=map] (43) at (4, -0.5) {$f$}; \node [style=none] (44) at (4, -1.75) {}; \node [style=none] (45) at (4, 1.75) {}; \node [style=map] (48) at (5.5, 0.5) {$g$}; \node [style=none] (49) at (5.5, -1.75) {}; \node [style=none] (50) at (5.5, 1.75) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (3.center) to (2.center); \draw (28.center) to (27.center); \draw (34.center) to (33.center); \draw (39.center) to (38.center); \draw (45.center) to (44.center); \draw (50.center) to (49.center); \end{pgfonlayer} \end{tikzpicture} At times we often omit labelling certain wires (objects) in diagrams as in the above. Let us now introduce our primary example category in this article. In the category $\mathbf{Mat}_{\mathbb{R}^+}$ of positive matrices, the objects are finite sets $X,Y,\dots$ and the morphisms $M \colon X \to Y$ are functions $M \colon X \times Y \to \mathbb{R}^+$ where $\mathbb{R}^+ := \{r \in \mathbb{R} \mid r \geq 0 \}$. We think of such a function as an '$X \times Y$ matrix' with entries $M(y \mid x) := M(x,y) \in \mathbb{R}^+$ for $x \in X$, $y \in Y$. $ \quad :: \ (x,y) \ \mapsto \ M(y \mid x)$ We compose morphisms $N \circ M$ via the matrix product, given by summation over internal wires. $ \quad :: \ (x,z) \ \mapsto \ \sum_{y \in Y} N(z \mid y) M(y \mid x)$ The tensor $\otimes$ is given on objects by $X \otimes Y = X \times Y$, and on morphisms by the Kronecker product $ \quad :: \ ((x,y),(w,z)) \ \mapsto \ M(w \mid x)N(z \mid y)$ The symmetry is simply the obvious isomorphism $X \times Y \simeq Y \times X$. The unit object is the singleton set $I=\{\star\}$. A state $\omega$ of $X$ is then equivalent to a positive function on $X$: $ \quad :: \ x \ \mapsto \ \omega(x)$ where $\omega(x) := \omega(x \mid \star)$. Special cases, discussed later, are probability distributions over $X$. In just the same way, an effect $e$ on $X$ is also equivalent to a positive function on $X$ via $e(x) := e(\star \mid x)$. $ \quad :: \ x \ \mapsto \ e(x)$ A scalar $r \colon I \to I$ is precisely a positive real $r \in \mathbb{R}^+$, and composing scalars amounts to multiplication $r \otimes s = r \circ s = r \cdot s$ in $\mathbb{R}^+$. Categories such as \mathbf{Mat}_{\mathbb{R}^+} come with further structure, which allow one to describe many aspects of probability theory entirely diagrammatically ChoEtAl_2019_DisintegrationViaStringDiagramsfritz2020synthetic. ChoEtAl_2019_DisintegrationViaStringDiagrams A cd-category (copy-discard category) is a symmetric monoidal category in which each object comes with a specified pair of morphisms $$ called copying and discarding, respectively, which satisfy the following: $$ Formally, these say that copying and discarding form a commutative comonoid. The choice of these morphisms is moreover 'natural' in that the following hold for all objects $X, Y$. $ \qquad \qquad \qquad \qquad $ Thanks to the axioms for copying, we can unambiguously define a copying morphism with n output legs, for any n \geq 1, as follows. \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (0) at (20, 0) {}; \node [style=none] (1) at (18.75, 1.25) {}; \node [style=none] (2) at (21.25, 1.25) {}; \node [style=none] (3) at (20, -1) {}; \node [style=none] (4) at (20, 1) {$\dots$}; \node [style=none] (5) at (22.25, 0) {$:=$}; \node [style=whitedot] (6) at (24.25, -0.75) {}; \node [style=none] (7) at (23.5, 0) {}; \node [style=none] (8) at (25, 0) {}; \node [style=none] (9) at (24.25, -1.75) {}; \node [style=whitedot] (10) at (25, 0) {}; \node [style=none] (11) at (24.25, 0.5) {}; \node [style=none] (12) at (25.75, 0.5) {}; \node [style=none] (13) at (25.5, 1.25) {$\dots$}; \node [style=none] (14) at (26.25, 1.75) {}; \node [style=none] (15) at (26.25, 2.25) {}; \node [style=whitedot] (16) at (26.25, 2.25) {}; \node [style=none] (17) at (25.5, 3) {}; \node [style=none] (18) at (27, 3) {}; \node [style=whitedot] (19) at (31.5, -0.75) {}; \node [style=none] (20) at (32.25, 0) {}; \node [style=none] (21) at (30.75, 0) {}; \node [style=none] (22) at (31.5, -1.75) {}; \node [style=whitedot] (23) at (30.75, 0) {}; \node [style=none] (24) at (31.5, 0.5) {}; \node [style=none] (25) at (30, 0.5) {}; \node [style=none] (26) at (30, 1.25) {$\dots$}; \node [style=none] (27) at (29.5, 1.75) {}; \node [style=none] (28) at (29.5, 2.25) {}; \node [style=whitedot] (29) at (29.5, 2.25) {}; \node [style=none] (30) at (30.25, 3) {}; \node [style=none] (31) at (28.75, 3) {}; \node [style=none] (32) at (28, 0) {$=$}; \node [style=none] (33) at (23.5, 0.75) {}; \node [style=none] (34) at (24.25, 1.25) {}; \node [style=none] (35) at (31.5, 1.25) {}; \node [style=none] (36) at (32.25, 0.5) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (3.center) to (0); \draw [bend left=45] (0) to (1.center); \draw [bend right=45] (0) to (2.center); \draw (9.center) to (6); \draw [bend left=45] (6) to (7.center); \draw [bend right=45] (6) to (8.center); \draw [bend left=45] (10) to (11.center); \draw [bend right=45] (10) to (12.center); \draw [bend left=45] (16) to (17.center); \draw [bend right=45] (16) to (18.center); \draw (14.center) to (16); \draw (10) to (8.center); \draw (22.center) to (19); \draw [bend right=45] (19) to (20.center); \draw [bend left=45] (19) to (21.center); \draw [bend right=45] (23) to (24.center); \draw [bend left=45] (23) to (25.center); \draw [bend right=45] (29) to (30.center); \draw [bend left=45] (29) to (31.center); \draw (27.center) to (29); \draw (23) to (21.center); \draw (11.center) to (34.center); \draw (7.center) to (33.center); \draw (36.center) to (20.center); \draw (24.center) to (35.center); \end{pgfonlayer} \end{tikzpicture} Formally we also define the copying morphism with n=0 output legs to be the discarding effect { {\space \begin{picc} \begin{aligned} \begin{tikzpicture}[font=\tiny,scale=1.0] \node[upgroundsmall, xscale=0.8, yscale=0.7] (1) at (0,0.16) {}; \draw (0,0.03) to (0,-0.25); \end{picc} \\@#STOP\hspace{-1pt}}}_{}\\@#STOP$. In a cd-category the processes of a truly `probabilistic' or \rlb{`stochastic'} nature are those satisfying the following. \begin{definition} \label{Def_Channel} A morphism $f$ is a \emph{channel} when it preserves discarding: \[ \tikzfig{causal2} \] In particular, we call a state $\omega$ \emph{normalised} when the following holds. \[ \tikzfig{state-norm} \] A cd-category in which every morphism is a channel, or equivalently $\discard{}$ is the unique effect on any object, is called a \emph{Markov category} \cite{fritz2020synthetic}. Given any cd-category $\catC$, its subcategory $\catC_\channel$ of channels always forms a Markov category. \end{definition} A useful rule, which follows from \eqref{eq:nat-rules}, is that any channel with multiple inputs $f$ satisfies: \[ \tikzfig{mech-disc} \] The presence of discarding allows one to `ignore' certain outputs of morphisms. Given any morphism $f$ from $X$ to $Y, Z$, its \emph{marginal} $X \to Y$ is the following morphism. \[ \tikzfig{marginal} \] \begin{example} $\MatR$ is a cd-category. The copy map on $X$ is given $\tinymultflip[whitedot](y,z \mid x) = \delta_{x,y,z}$ with value $1$ iff $x=y=z$ and $0$ otherwise. Discarding $\discard{}$ on $X$ is given by the function with $x \mapsto 1$ for all $x \in X$. A state $\omega$ is normalised precisely when it forms a normalised probability distribution over $X$. \[ \tikzfig{normal-state2} \ \ \text{normalised} \quad \iff \quad \sum_{x \in X} \omega(x) = 1 \] More generally, $M \colon X \to Y$ is a channel precisely when it forms a \emph{probability channel}, or equivalently the matrix is \emph{Stochastic}, meaning that it sends each $x \in X$ to a normalised distribution. \[ \tikzfig{mat-mor} \ \ \text{a channel} \iff \sum_{y \in Y} M(y \mid x) = 1 \ \ \forall x \in X. \] Such a probability channel from $X$ to $Y$ is often also referred to as a \emph{conditional probability distribution} and called $P$ or $P(Y \mid X)$, with values denoted $P(y \mid x) := P(Y=y \mid X =x)$ for $x \in X$, $y \in Y$. The subcategory of channels in $\MatR$ is the Markov category $\FStoch$ of finite Stochastic matrices, consisting of probability channels between finite sets. Often we will restrict attention to this subcategory. Diagrams can now help us express a few basic features of probability theory. For example, given probability distributions $\omega, \sigma$ over $X, Y$, their tensor corresponds to the resulting independent distribution over $X \times Y$. \[ \tikzfig{prod-state} \] An arbitrary normalised state on $X \otimes Y$ corresponds to a joint distribution over $X, Y$ (left-hand below). Similarly a channel as on right-hand below represents a probability channel \rl{$P(Y_1,\dots,Y_m \mid X_1,\dots,X_n)$. } \[ \tikzfig{joint-state} \qquad \qquad \qquad \tikzfig{MXY} \] Given any state over $X$ and $Y$ in $\MatR$, the marginal on $X$ corresponds to taking the marginal in the usual probabilistic sense, by summation over $Y$. In general, marginalisation of any morphism is given by summation over the discarded object. \[ \tikzfig{marginal2} :: x \mapsto \sum_{y \in Y} \omega(x,y) \qquad \qquad \qquad \tikzfig{margM} :: (x,y) \mapsto \sum_{z \in Z} M(y,z \mid x) \] Finally observe that for any effect $e \colon X \to \mathbb{R}^+$ and normalised state $\omega$ the scalar $e \circ \omega$ corresponds to the expectation value of $e$ according to the probability distribution $\omega$. \[ \tikzfig{exp-value} \ \ = \ \ \sum_{x \in X} e(x) \omega(x) \] \end{example} When defining causal models we are largely concerned with channels and so will work in the Markov category $\FStoch$ of finite probabilistic processes, the subcategory of channels in $\MatR$. However for causal reasoning (such as in the calculation in Sections \ref{Sec_CE_Identifiability} and \ref{Sec_Counterfactuals}) it can be helpful to work in a broader category, in particular containing non-trivial effects, such as $\MatR$. \subsection{Deterministic processes and caps} Amongst general processes of a `probabilistic' nature it can be helpful to identify those processes which behave `deterministically', like (partial) functions. In fact these are precisely those that respect copying, as follows. \begin{definition} \label{def:deterministic} A morphism $f$ is called \emph{deterministic} when the following holds. \[ \tikzfig{deterministic} \] \end{definition} An important special case in practice are deterministic states, which here we also call \emph{sharp}. We depict a sharp state $x$ of $X$ as: \[ \tikzfig{sharp-state} \] By definition these are the states which are directly copied by the copy morphisms: \begin{equation} \label{eq:copy-points} \tikzfig{copy-points} \end{equation} A feature we will find useful in causal reasoning later is a correspondence between deterministic states and effects. For this we require an extra property of our cd-category. \begin{definition} We say that a cd-category $\catC$ has \emph{caps} when each object $X$ comes with a distinguished effect on $X \otimes X$ depicted as $\tinycap$ and satisfying the following: \begin{equation}\label{eq:mult-map} \tikzfig{cap-sym} \qquad \qquad \ \ \ \tikzfig{cap-2} \ \ \ \qquad \qquad \tikzfig{mult-map2} \end{equation} The choice of caps is moreover `natural' in that the following hold for all objects $X, Y$. \[ \tikzfig{capXYrule} % \qquad \qquad \qquad \tikzfig{capIrule} \] \end{definition} It follows that each cap is deterministic.\footnote{One may verify that the right-hand morphism in \eqref{eq:mult-map} is commutative and associative, forming a commutative \emph{semigroup} multiplication $\tinymult[whitedot]$ in $\catC$.} Now in any cd-category with caps, for any sharp state $x$ there is a corresponding deterministic effect denoted $x^\dagger$ and depicted as `flipping $x$ upside-down': \[ \tikzfig{cap-3} \] We call an effect \emph{sharp} when it is of the form $x^\dagger$. One may verify that for any sharp state $x$, the effect $x^\dagger$ is the unique effect satisfying the following. \begin{equation} \label{eq:sharp-state-eff} \tikzfig{sharp-1} \qquad \qquad \tikzfig{sharp-2} \end{equation} Caps are particularly useful in diagrammatic reasoning when they are \emph{cancellative}, meaning that: \[ \tikzfig{capcancel} \label{eq:cancel-caps} \] for all morphisms $f, g$. \begin{example} $\MatR$ has cancellative caps. Each point $x \in X$ corresponds to a normalised sharp state on $X$ which we again denote by $x$, given by the point probability distribution $\delta_x$ \rlb{on $X$.} The sharp effect $x^\dagger$ is then given by the same function $\delta_x$ on $X$ also. \[ \tikzfig{sharp-state-x} \ \ , \ \ \tikzfig{sharp-effect-x} \quad :: \ y \mapsto \begin{cases} 1 & x=y \\ 0 & \text{otherwise} \end{cases} \] The cap is given by \ $\tinycap(x,y) = \delta_{x,y}$. Hence, for all $x, y \in X$ the following holds. \[ \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (-2, -0.25) {}; \node [style=none] (1) at (-0.25, -0.25) {}; \node [style=sharpstate] (2) at (-2, -0.5) {$x$}; \node [style=sharpstate] (3) at (-0.25, -0.5) {$y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [bend left=90, looseness=1.75] (0.center) to (1.center); \end{pgfonlayer} \end{tikzpicture} \quad = \quad \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (4.75, -0.75) {}; \node [style=sharpeffect] (1) at (4.75, 1) {$y$}; \node [style=sharpstate] (2) at (4.75, -0.75) {$x$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1) to (0.center); \end{pgfonlayer} \end{tikzpicture} \ \ = \ \ \begin{cases} 1 & x = y \\ 0 & \text{otherwise} \end{cases} Every sharp state on $X$ is of the above form for some $x \in X$, or else given by the \emph{zero state} $0$ defined by $0(x) = 0$ for all $x \in X$. Then $0^\dagger$ is given by the constant zero function on $X$ also. In particular the only sharp scalars are $0$ and $1$. We note the useful observation that for any morphism $M \colon X \to Y$ its scalar values $M(y \mid x) \in \mathbb{R}^+$ can be given diagrammatically by composing with $x$ and $y^\dagger$ as below. \begin{equation*} \ M(y \mid x) \ \ \ = \ \ \ \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=sharpeffect] (0) at (0.5, 1.5) {$y$}; \node [style=none] (1) at (0.5, 1.5) {}; \node [style=sharpstate] (2) at (0.5, -1.5) {$x$}; \node [style=map] (3) at (0.5, 0) {$M$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2) to (1.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} Since ${\raisebox{-2pt}{$\begin{pic} \begin{aligned} \begin{tikzpicture}[scale=2.0, font=\tiny,scale=0.4,yscale=1] \node (0) at (0,0) {}; \node[whitedott, inner sep=1.5pt] (1) at (0,0.55) {}; \node (2) at (-0.5,1) {}; \node (3) at (0.5,1) {}; \draw (0.center) to (1.center); \draw (1.center) to [out=left, in=down, out looseness=1.5] (2.center); \draw (1.center) to [out=right, in=down, out looseness=1.5] (3.center); \node[whitedott, inner sep=1.5pt] (1) at (0,0.55) {}; \end{pic} \\@#STOP\hspace{-3pt}\\@#STOP}(y,z \mid x) = \delta_{x,y,z}$ it is indeed the case that $ {\raisebox{-2pt}{\hspace{-5pt}\ensuremath{\begin{pic}\begin{aligned}\begin{tikzpicture}[scale=2.0, font=\tiny,scale=0.4,yscale=1] \node (0) at (0,0) {}; \node[whitedott, inner sep=1.5pt] (1) at (0,0.55) {}; \node (2) at (-0.5,1) {}; \node (3) at (0.5,1) {}; \draw (0.center) to (1.center); \draw (1.center) to [out=left, in=down, out looseness=1.5] (2.center); \draw (1.center) to [out=right, in=down, out looseness=1.5] (3.center); \node[whitedott, inner sep=1.5pt] (1) at (0,0.55) {}; \end{pic} }\hspace{-3pt}}}$ copies each state given by $x \in X$ as in \eqref{eq:copy-points}. In contrast, a general state $\omega$ is not copyable. \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (0) at (0, 0) {}; \node [style=none] (1) at (-1, 1) {}; \node [style=none] (2) at (1, 1) {}; \node [style=none] (3) at (0, -1.25) {}; \node [style=label] (4) at (-1, 1.5) {$X$}; \node [style=label] (5) at (1, 1.5) {$X$}; \node [style=map] (6) at (0, -1.5) {$\omega$}; \node [style=none] (7) at (2.5, 0) {$\neq$}; \node [style=map] (8) at (4.25, -0.75) {$\omega$}; \node [style=map] (9) at (5.75, -0.75) {$\omega$}; \node [style=none] (10) at (4.25, 0.5) {}; \node [style=none] (11) at (5.75, 0.5) {}; \node [style=label] (12) at (4.25, 1) {$X$}; \node [style=label] (13) at (5.75, 1) {$X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [bend right=45] (1.center) to (0); \draw [bend right=45] (0) to (2.center); \draw (3.center) to (0); \draw (11.center) to (9); \draw (10.center) to (8); \end{pgfonlayer} \end{tikzpicture} The left-hand side above is given by $(x,y) \mapsto \omega(x) \delta_{x,y}$, while the right is $(x,y) \mapsto \omega(x) \omega(y)$, which differ unless $\omega$ is zero or $\omega = x$ for some $x \in X$. \textcolor{black}{The left-hand side defines a distribution, where the copies of $X$ are perfectly correlated, while on the right-hand side it is a product distribution with both $X$ independent from each other.} Deterministic morphisms in $\mathbf{Mat}_{\mathbb{R}^+}$ correspond to (partial) functions. Any partial function $f \colon X \to Y$ defines a morphism $X \to Y$ which we again denote by $f$, given on sharp states $x \in X$ by $f \circ x = f(x)$ whenever $f(x)$ is defined, and $f \circ x = 0$ otherwise. Equivalently, \begin{equation*} \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (0, 0) {$f$}; \node [style=none] (1) at (0, 1.5) {}; \node [style=none] (2) at (0, -1.5) {}; \node [style=label] (3) at (-0.5, 1) {\small $Y$}; \node [style=sharpstate] (4) at (0, -1.75) {$x$}; \node [style=sharpeffect] (5) at (0, 1.75) {$y$}; \node [style=label] (6) at (-0.5, -1) {\small $X$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \end{pgfonlayer} \end{tikzpicture} \ = \ \begin{cases} 1 & f(x) \text{ defined and } y = f(x) \\ 0 & \text{otherwise} \end{cases} \end{equation*} The framework of cd-categories is much more general than simply the finite discrete setting of $\mathbf{Mat}_{\mathbb{R}^+}$. In Appendix \ref{sec:measure-theory} we detail a cd-category for the treatment of full measure-theoretic probability, though this is not required to read the remainder of the paper. The remainder of this section introduces further properties of a cd-category useful for causal reasoning, \textcolor{black}{which while not essential for defining causal models as such, will be very useful in Sections }\ref{Sec_CE_Identifiability}\textcolor{black}{ and }\ref{Sec_Counterfactuals}\textcolor{black}{.} \subsection{Normalisation} In causal reasoning one often needs to normalise states and processes. Intuitively, for a general state $\omega$, its normalisation should be another state $\mathop{\mathrm{norm}}\nolimits(\omega)$, depicted with a dashed box around $\omega$, such that the following holds: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (1.5, 0) {$\omega$}; \node [style=none] (1) at (1.5, 1.5) {}; \node [style=none] (2) at (0.5, 1) {}; \node [style=none] (3) at (2.5, 1) {}; \node [style=none] (4) at (2.5, -1) {}; \node [style=none] (5) at (0.5, -1) {}; \node [style=none] (6) at (0.5, 1) {}; \node [style=map] (7) at (-4.75, 0) {$\omega$}; \node [style=none] (8) at (-4.75, 1.5) {}; \node [style=none] (14) at (-2.75, 0) {$=$}; \node [style=map] (15) at (-1, 0) {$\omega$}; \node [style=upground] (16) at (-1, 1.5) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (0); \draw [norm] (6.center) to (5.center); \draw [norm] (5.center) to (4.center); \draw [norm] (4.center) to (3.center); \draw [norm] (3.center) to (6.center); \draw (8.center) to (7); \draw (16) to (15); \end{pgfonlayer} \end{tikzpicture} \end{equation} \textcolor{black}{In $\mathbf{Mat}_{\mathbb{R}^+}$ this corresponds to writing a state as a product of a non-negative constant and a 'correctly normalised' probability distribution.} More generally, for any process $f$ its normalisation $\mathop{\mathrm{norm}}\nolimits(f)$ is a process depicted: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (4) at (2.25, 0) {$f$}; \node [style=none] (5) at (2.25, 1.5) {}; \node [style=none] (6) at (2.25, -1.5) {}; \node [style=none] (7) at (1.25, 1) {}; \node [style=none] (8) at (3.25, 1) {}; \node [style=none] (9) at (3.25, -1) {}; \node [style=none] (10) at (1.25, -1) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (5.center) to (6.center); \draw [norm, in=270, out=90] (10.center) to (7.center); \draw [norm] (7.center) to (8.center); \draw [norm] (8.center) to (9.center); \draw [norm] (9.center) to (10.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} which intuitively is given on each sharp state input $x$ by normalising $f \circ x$. \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (-3, 0) {$f$}; \node [style=none] (1) at (-3, 1.5) {}; \node [style=sharpstate] (2) at (-3, -2) {$x$}; \node [style=none] (3) at (-2, 1) {}; \node [style=none] (4) at (-4, 1) {}; \node [style=none] (5) at (-4, -1) {}; \node [style=none] (6) at (-2, -1) {}; \node [style=map] (7) at (2.25, 0) {$f$}; \node [style=none] (8) at (2.25, 1.5) {}; \node [style=sharpstate] (9) at (2.25, -1.25) {$x$}; \node [style=none] (10) at (3.25, 1) {}; \node [style=none] (11) at (1.25, 1) {}; \node [style=none] (12) at (1.25, -2) {}; \node [style=none] (13) at (3.25, -2) {}; \node [style=none] (14) at (-0.25, 0) {$=$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (2); \draw [norm, in=-90, out=90] (6.center) to (3.center); \draw [norm] (3.center) to (4.center); \draw [norm] (4.center) to (5.center); \draw [norm] (5.center) to (6.center); \draw (8.center) to (9); \draw [norm, in=-90, out=90] (13.center) to (10.center); \draw [norm] (10.center) to (11.center); \draw [norm] (11.center) to (12.center); \draw [norm] (12.center) to (13.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} Another way to express the latter without reference to states is the following. \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (0) at (4.75, -1) {}; \node [style=map] (1) at (0.5, 0) {$f$}; \node [style=none] (2) at (0.5, 1) {}; \node [style=none] (3) at (0.5, -1) {}; \node [style=none] (4) at (2, 0) {$=$}; \node [style=none] (5) at (3.75, 0) {}; \node [style=none] (6) at (5.75, 0) {}; \node [style=map] (7) at (3.75, 0.25) {$f$}; \node [style=upground] (8) at (3.75, 1.5) {}; \node [style=none] (9) at (4.75, -1.75) {}; \node [style=none] (10) at (5.75, 1.5) {}; \node [style=map] (11) at (5.75, 0.25) {$f$}; \node [style=none] (12) at (5, 1) {}; \node [style=none] (13) at (6.5, 1) {}; \node [style=none] (14) at (6.5, -0.5) {}; \node [style=none] (15) at (5, -0.5) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (3.center) to (2.center); \draw [bend right=45] (5.center) to (0); \draw [bend right=45] (0) to (6.center); \draw (8) to (7); \draw (9.center) to (0); \draw (10.center) to (6.center); \draw [norm] (15.center) to (14.center); \draw [norm] (14.center) to (13.center); \draw [norm] (13.center) to (12.center); \draw [norm] (12.center) to (15.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} Plugging in a sharp state $x$ \textcolor{black}{into} \ref{eq:norm-sup-cond} shows that \ref{eq:state-norm} holds for $\omega = f \circ x$, with $\mathop{\mathrm{norm}}\nolimits(\omega) = \mathop{\mathrm{norm}}\nolimits(f) \circ x$. When $f = \omega$ is already a state this reduces to \ref{eq:state-norm}. \textcolor{black}{One can often think of a normalised process} as a channel, so that for a state $\omega$ its normalisation $\mathop{\mathrm{norm}}\nolimits(\omega)$ is normalised in our earlier sense. But this is not always possible. For example in $\mathbf{Mat}_{\mathbb{R}^+}$ if $\omega$ is the zero state then $\mathop{\mathrm{norm}}\nolimits(\omega) = 0$. More generally if $f(x) = 0$ for some input $x$, then $\mathop{\mathrm{norm}}\nolimits(f)(x) = 0$ also. Instead, a normalisation $g = \mathop{\mathrm{norm}}\nolimits(f)$ will in general only be a \emph{partial channel}, meaning that the following holds. \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (0) at (4.75, -1) {}; \node [style=map] (1) at (0.5, 0) {$g$}; \node [style=none] (2) at (0.5, 1) {}; \node [style=none] (3) at (0.5, -1) {}; \node [style=none] (4) at (2, 0) {$=$}; \node [style=none] (5) at (3.75, 0) {}; \node [style=none] (6) at (5.75, 0) {}; \node [style=map] (7) at (3.75, 0.25) {$g$}; \node [style=upground] (8) at (3.75, 1.5) {}; \node [style=none] (9) at (4.75, -1.75) {}; \node [style=none] (10) at (5.75, 1.5) {}; \node [style=map] (11) at (5.75, 0.25) {$g$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (3.center) to (2.center); \draw [bend right=45] (5.center) to (0); \draw [bend right=45] (0) to (6.center); \draw (8) to (7); \draw (9.center) to (0); \draw (10.center) to (6.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} We can now define normalisation formally, while stating some of its essential properties. \begin{definition} Let $\mathbf{C}$ be a cd-category. We say $\mathbf{C}$ has normalisation when for every morphism $f$ there is a distinguished partial channel $\mathop{\mathrm{norm}}\nolimits(f)$, depicted with a dashed box as in \ref{eq:norm-box}, such that \ref{eq:norm-sup-cond} holds and: \begin{enumerate} \item Whenever $f$ is a partial channel we have $\mathop{\mathrm{norm}}\nolimits(f) = f$. \item For all morphisms $f, g$: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (1.25, 0) {$f$}; \node [style=none] (1) at (1.25, 1.5) {}; \node [style=none] (2) at (1.25, -1.5) {}; \node [style=none] (3) at (3.75, 1) {}; \node [style=none] (4) at (0.25, 1) {}; \node [style=none] (5) at (0.25, -1) {}; \node [style=none] (6) at (3.75, -1) {}; \node [style=map] (7) at (2.75, 0) {$g$}; \node [style=none] (8) at (2.75, 1.5) {}; \node [style=none] (9) at (2.75, -1.5) {}; \node [style=none] (10) at (5.25, 0) {$=$}; \node [style=map] (11) at (7.5, 0) {$f$}; \node [style=none] (12) at (7.5, 1.5) {}; \node [style=none] (13) at (7.5, -1.5) {}; \node [style=none] (14) at (8.5, 1) {}; \node [style=none] (15) at (6.5, 1) {}; \node [style=none] (16) at (6.5, -1) {}; \node [style=none] (17) at (8.5, -1) {}; \node [style=map] (18) at (10.25, 0) {$g$}; \node [style=none] (19) at (10.25, 1.5) {}; \node [style=none] (20) at (10.25, -1.5) {}; \node [style=none] (21) at (11.25, 1) {}; \node [style=none] (22) at (9.25, 1) {}; \node [style=none] (23) at (9.25, -1) {}; \node [style=none] (24) at (11.25, -1) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (2.center); \draw [norm, in=-90, out=90] (6.center) to (3.center); \draw [norm] (3.center) to (4.center); \draw [norm] (4.center) to (5.center); \draw [norm] (5.center) to (6.center); \draw (8.center) to (9.center); \draw (12.center) to (13.center); \draw [norm, in=-90, out=90] (17.center) to (14.center); \draw [norm] (14.center) to (15.center); \draw [norm] (15.center) to (16.center); \draw [norm] (16.center) to (17.center); \draw (19.center) to (20.center); \draw [norm, in=-90, out=90] (24.center) to (21.center); \draw [norm] (21.center) to (22.center); \draw [norm] (22.center) to (23.center); \draw [norm] (23.center) to (24.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \item For all morphisms $f$: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (10) at (5.75, 0) {$=$}; \node [style=map] (11) at (8.25, 0) {$f$}; \node [style=upground] (12) at (8.25, 1.5) {}; \node [style=none] (13) at (8.25, -1.5) {}; \node [style=none] (14) at (9.25, 1) {}; \node [style=none] (15) at (7.25, 1) {}; \node [style=none] (16) at (7.25, -1) {}; \node [style=none] (17) at (9.25, -1) {}; \node [style=map] (18) at (3.25, 0) {$f$}; \node [style=upground] (19) at (3.25, 1.5) {}; \node [style=none] (20) at (3.25, -1.5) {}; \node [style=none] (21) at (4.25, 2.25) {}; \node [style=none] (22) at (2.25, 2.25) {}; \node [style=none] (23) at (2.25, -1) {}; \node [style=none] (24) at (4.25, -1) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (12) to (13.center); \draw [norm, in=-90, out=90] (17.center) to (14.center); \draw [norm] (14.center) to (15.center); \draw [norm] (15.center) to (16.center); \draw [norm] (16.center) to (17.center); \draw (19) to (20.center); \draw [norm, in=-90, out=90] (24.center) to (21.center); \draw [norm] (21.center) to (22.center); \draw [norm] (22.center) to (23.center); \draw [norm] (23.center) to (24.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \item For all morphisms $f$: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (5) at (-1.5, 1.25) {}; \node [style=none] (6) at (1.5, 1.25) {}; \node [style=none] (7) at (1.5, -2.25) {}; \node [style=none] (8) at (-1.5, -2.25) {}; \node [style=whitedot] (9) at (0, -1.75) {}; \node [style=none] (10) at (-1, -0.75) {}; \node [style=none] (11) at (1, -0.75) {}; \node [style=medium map] (12) at (0, 0.25) {$f$}; \node [style=none] (13) at (0, 0.75) {}; \node [style=none] (14) at (0, 1.75) {}; \node [style=none] (15) at (0, -2.75) {}; \node [style=none] (16) at (3, 0) {$=$}; \node [style=none] (17) at (4.75, 1.25) {}; \node [style=none] (18) at (7.75, 1.25) {}; \node [style=none] (19) at (7.75, -0.75) {}; \node [style=none] (20) at (4.75, -0.75) {}; \node [style=whitedot] (21) at (6.25, -1.75) {}; \node [style=none] (22) at (5.25, -0.75) {}; \node [style=none] (23) at (7.25, -0.75) {}; \node [style=medium map] (24) at (6.25, 0.25) {$f$}; \node [style=none] (25) at (6.25, 0.75) {}; \node [style=none] (26) at (6.25, 1.75) {}; \node [style=none] (27) at (6.25, -2.75) {}; \node [style=none] (28) at (5.25, 0) {}; \node [style=none] (29) at (7.25, 0) {}; \node [style=none] (30) at (-1, 0) {}; \node [style=none] (31) at (1, 0) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [norm, in=270, out=90] (8.center) to (5.center); \draw [norm] (5.center) to (6.center); \draw [norm] (6.center) to (7.center); \draw [norm] (7.center) to (8.center); \draw [bend right=45] (10.center) to (9); \draw [bend left=45] (11.center) to (9); \draw (14.center) to (13.center); \draw (15.center) to (9); \draw [norm, in=270, out=90] (20.center) to (17.center); \draw [norm] (17.center) to (18.center); \draw [norm] (18.center) to (19.center); \draw [norm] (19.center) to (20.center); \draw [bend right=45] (22.center) to (21); \draw [bend left=45] (23.center) to (21); \draw (26.center) to (25.center); \draw (27.center) to (21); \draw (29.center) to (23.center); \draw (28.center) to (22.center); \draw (30.center) to (10.center); \draw (11.center) to (31.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \end{enumerate} \end{definition} In Appendix \ref{sec:normalisation-appendix} we in fact prove that a normalisation structure on a cd-category $\mathbf{C}$ is unique whenever it exists, being totally fixed by the above axioms. We also \textcolor{black}{establish in App.~}\ref{sec:normalisation-appendix} further conditions, which ensure that a cd-category has normalisation, and the following further properties of normalisation. \begin{lemma} Let $\mathbf{C}$ be a cd-category with normalisation. \begin{enumerate} \item Whenever $f$ is a channel, for all morphisms $g$ we have: \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (4) at (0.25, 0) {$=$}; \node [style=map] (6) at (-2, -0.75) {$g$}; \node [style=map] (7) at (-2, 1) {$f$}; \node [style=none] (8) at (-2, -2.5) {}; \node [style=none] (9) at (-2, 2.5) {}; \node [style=none] (13) at (-3, 2) {}; \node [style=none] (14) at (-1, 2) {}; \node [style=none] (15) at (-1, -1.5) {}; \node [style=none] (16) at (-3, -1.5) {}; \node [style=map] (24) at (2.75, -0.75) {$g$}; \node [style=map] (25) at (2.75, 1) {$f$}; \node [style=none] (26) at (2.75, -2.5) {}; \node [style=none] (27) at (2.75, 2.5) {}; \node [style=none] (28) at (1.75, 0.25) {}; \node [style=none] (29) at (3.75, 0.25) {}; \node [style=none] (30) at (3.75, -1.75) {}; \node [style=none] (31) at (1.75, -1.75) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (9.center) to (8.center); \draw [norm, in=270, out=90] (16.center) to (13.center); \draw [norm] (13.center) to (14.center); \draw [norm] (14.center) to (15.center); \draw [norm] (15.center) to (16.center); \draw (27.center) to (26.center); \draw [norm, in=270, out=90] (31.center) to (28.center); \draw [norm] (28.center) to (29.center); \draw [norm] (29.center) to (30.center); \draw [norm] (30.center) to (31.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} \item For all morphisms $f$ we have: \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (5) at (-1.75, 1.25) {}; \node [style=none] (6) at (4, 1.25) {}; \node [style=none] (7) at (4, -2) {}; \node [style=none] (8) at (-1.75, -2) {}; \node [style=whitedot] (9) at (2, -1.5) {}; \node [style=none] (10) at (1, -0.5) {}; \node [style=none] (11) at (3, -0.5) {}; \node [style=medium map] (12) at (0, 0.25) {$f$}; \node [style=none] (13) at (0, 0.75) {}; \node [style=none] (14) at (0, 2) {}; \node [style=none] (15) at (2, -2.75) {}; \node [style=none] (30) at (1, 0) {}; \node [style=none] (31) at (3, 2) {}; \node [style=none] (32) at (-1, 0) {}; \node [style=none] (33) at (-1, -2.75) {}; \node [style=none] (34) at (5.5, 0) {$=$}; \node [style=none] (35) at (6.5, 1.25) {}; \node [style=none] (36) at (10, 1.25) {}; \node [style=none] (37) at (10, -0.75) {}; \node [style=none] (38) at (6.5, -0.75) {}; \node [style=whitedot] (39) at (10.25, -1.5) {}; \node [style=none] (40) at (9.25, -0.5) {}; \node [style=none] (41) at (11.25, -0.5) {}; \node [style=medium map] (42) at (8.25, 0.25) {$f$}; \node [style=none] (43) at (8.25, 0.75) {}; \node [style=none] (44) at (8.25, 2) {}; \node [style=none] (45) at (10.25, -2.75) {}; \node [style=none] (46) at (9.25, 0) {}; \node [style=none] (47) at (11.25, 2) {}; \node [style=none] (48) at (7.25, 0) {}; \node [style=none] (49) at (7.25, -2.75) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [norm, in=270, out=90] (8.center) to (5.center); \draw [norm] (5.center) to (6.center); \draw [norm] (6.center) to (7.center); \draw [norm] (7.center) to (8.center); \draw [bend right=45] (10.center) to (9); \draw [bend left=45] (11.center) to (9); \draw (14.center) to (13.center); \draw (15.center) to (9); \draw (30.center) to (10.center); \draw (11.center) to (31.center); \draw (33.center) to (32.center); \draw [norm, in=270, out=90] (38.center) to (35.center); \draw [norm] (35.center) to (36.center); \draw [norm] (36.center) to (37.center); \draw [norm] (37.center) to (38.center); \draw [bend right=45] (40.center) to (39); \draw [bend left=45] (41.center) to (39); \draw (44.center) to (43.center); \draw (45.center) to (39); \draw (46.center) to (40.center); \draw (41.center) to (47.center); \draw (49.center) to (48.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} \item If $\mathbf{C}$ has caps then \ref{eq:norm-on-states} holds and more generally for every morphism $f$ and sharp state $x$ we have: \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (-2.75, 0) {$f$}; \node [style=none] (1) at (-2, 0) {}; \node [style=sharpstate] (2) at (-2, -1.5) {$x$}; \node [style=none] (3) at (-4.25, 1) {}; \node [style=none] (4) at (-1.25, 1) {}; \node [style=none] (5) at (-1.25, -0.75) {}; \node [style=none] (6) at (-4.25, -0.75) {}; \node [style=none] (14) at (0, 0) {$=$}; \node [style=none] (15) at (-3.5, 0) {}; \node [style=none] (16) at (-3.5, -2.75) {}; \node [style=none] (17) at (-2.75, 0.5) {}; \node [style=none] (18) at (-2.75, 2) {}; \node [style=medium map] (19) at (2.75, 0) {$f$}; \node [style=none] (20) at (3.5, 0) {}; \node [style=sharpstate] (21) at (3.5, -1.25) {$x$}; \node [style=none] (22) at (1.25, 1) {}; \node [style=none] (23) at (4.25, 1) {}; \node [style=none] (24) at (4.25, -2) {}; \node [style=none] (25) at (1.25, -2) {}; \node [style=none] (26) at (2, 0) {}; \node [style=none] (27) at (2, -2.75) {}; \node [style=none] (28) at (2.75, 0.5) {}; \node [style=none] (29) at (2.75, 2) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (2); \draw [norm, in=-90, out=90] (6.center) to (3.center); \draw [norm] (3.center) to (4.center); \draw [norm] (4.center) to (5.center); \draw [norm] (5.center) to (6.center); \draw (16.center) to (15.center); \draw (18.center) to (17.center); \draw (20.center) to (21); \draw [norm, in=-90, out=90] (25.center) to (22.center); \draw [norm] (22.center) to (23.center); \draw [norm] (23.center) to (24.center); \draw [norm] (24.center) to (25.center); \draw (27.center) to (26.center); \draw (29.center) to (28.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \end{enumerate} \end{lemma} \begin{example} $\mathbf{Mat}_{\mathbb{R}^+}$ has normalisation. On each object $X$ the zero state $0$, given by $0(x) = 0$ for all $x \in X$, satisfies $\mathop{\mathrm{norm}}\nolimits(0) = 0$. For a non-zero state $\omega$ we have \begin{equation*} \mathop{\mathrm{norm}}\nolimits(\omega)(x) = \frac{\omega(x)}{\sum_{x' \in X} \omega(x')} \end{equation*} For a general morphism $M \colon X \to Y$ the normalisation $\mathop{\mathrm{norm}}\nolimits(M)$ sends $x \in X$ to $\mathop{\mathrm{norm}}\nolimits(M(x))$. That is: \begin{equation*} \mathop{\mathrm{norm}}\nolimits(M)(y \mid x) = \begin{cases} \frac{M(y \mid x)}{\sum_{y' \in Y} M(y' \mid x)} & \text{ if } \sum_{y \in Y} M(y \mid x) \neq 0 \\ 0 & \text{otherwise} \end{cases} \end{equation*} \end{example} \subsection{\textcolor{black}{Conditioning and conditional independence}} For our purposes the main use of normalisation will be to define probabilistic conditioning. Here we give a diagrammatic account of conditioning which extends previous treatments by Cho and Jacobs \citation{ChoEtAl_2019_DisintegrationViaStringDiagrams} and Fritz \citation{Fritz_2020_SyntheticApproachToMarkovKernels} from channels to general processes, when in the presence of normalisation and well-behaved caps. \begin{definition} We say a cd-category $\mathbf{C}$ has effect conditioning when it has normalisation and cancellative caps. Then for any morphism $f \colon X \to Y \otimes Z$ we define the conditional $f|_Z \colon X \otimes Z\to Y$ to be the partial channel \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (0, 0) {$f|_Z$}; \node [style=none] (1) at (0, 0.25) {}; \node [style=none] (2) at (0.75, -1.25) {}; \node [style=none] (3) at (0, 1.25) {}; \node [style=none] (4) at (0.75, -0.25) {}; \node [style=none] (5) at (-0.75, -0.25) {}; \node [style=label] (6) at (0, 1.75) {$Y$}; \node [style=none] (7) at (-0.75, -1.25) {}; \node [style=label] (8) at (-0.75, -1.75) {$X$}; \node [style=label] (9) at (0.75, -1.75) {$Z$}; \node [style=none] (10) at (2.5, 0) {$=$}; \node [style=medium map] (11) at (5.5, 0) {$f$}; \node [style=none] (12) at (5, 0.25) {}; \node [style=none] (13) at (6, 0.25) {}; \node [style=none] (14) at (5, 1.75) {}; \node [style=none] (15) at (6, 0.5) {}; \node [style=none] (16) at (5.5, -0.25) {}; \node [style=label] (17) at (5, 2.25) {$Y$}; \node [style=none] (18) at (5.5, -1.75) {}; \node [style=label] (19) at (5.5, -2.25) {$X$}; \node [style=label] (20) at (7, -2.25) {$Z$}; \node [style=none] (21) at (7, 0.5) {}; \node [style=none] (22) at (7, -1.75) {}; \node [style=none] (23) at (4, 1.25) {}; \node [style=none] (24) at (7.5, 1.25) {}; \node [style=none] (25) at (4, -1) {}; \node [style=none] (26) at (7.5, -1) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (3.center); \draw (4.center) to (2.center); \draw (7.center) to (5.center); \draw (12.center) to (14.center); \draw (15.center) to (13.center); \draw (18.center) to (16.center); \draw [bend left=90, looseness=1.75] (15.center) to (21.center); \draw (22.center) to (21.center); \draw [norm] (24.center) to (23.center); \draw [norm] (26.center) to (24.center); \draw [norm] (25.center) to (26.center); \draw [norm] (25.center) to (23.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} For any deterministic state $z$ of $Z$ we then define $f|_z \colon X \to Y$ by composing with $z$, so that \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (3.5, 0) {$f|_z$}; \node [style=none] (1) at (3.5, 1) {}; \node [style=none] (2) at (3.5, -1) {}; \node [style=label] (3) at (3.5, -1.5) {$X$}; \node [style=label] (4) at (3.5, 1.5) {$Y$}; \node [style=none] (5) at (5.25, 0) {$:=$}; \node [style=label] (57) at (10.25, -1.75) {$Z$}; \node [style=medium map] (58) at (8, 0) {$f$}; \node [style=none] (59) at (7.5, 0.25) {}; \node [style=none] (60) at (8.5, 0.25) {}; \node [style=none] (61) at (7.5, 1.75) {}; \node [style=none] (62) at (8.5, 0.5) {}; \node [style=none] (63) at (8, -0.25) {}; \node [style=label] (64) at (7.5, 2.25) {$Y$}; \node [style=none] (65) at (8, -2) {}; \node [style=label] (66) at (8, -2.5) {$X$}; \node [style=none] (67) at (9.5, 0.5) {}; \node [style=none] (68) at (9.5, -2) {}; \node [style=none] (69) at (6.5, 1.25) {}; \node [style=none] (70) at (10, 1.25) {}; \node [style=none] (71) at (6.5, -1) {}; \node [style=none] (72) at (10, -1) {}; \node [style=sharpstate] (73) at (9.5, -2.25) {$z$}; \node [style=none] (92) at (11.75, 0) {$=$}; \node [style=medium map] (93) at (14.75, 0) {$f$}; \node [style=none] (94) at (14, 0.25) {}; \node [style=none] (95) at (15.5, 0.25) {}; \node [style=none] (96) at (14, 3.75) {}; \node [style=sharpeffect] (97) at (15.5, 2) {$z$}; \node [style=none] (98) at (14.75, -0.5) {}; \node [style=none] (99) at (14.75, -1.75) {}; \node [style=label] (100) at (14, 4.25) {$Y$}; \node [style=label] (101) at (16, 1) {$Z$}; \node [style=label] (102) at (14.75, -2.25) {$X$}; \node [style=none] (103) at (13, 3) {}; \node [style=none] (104) at (16.5, 3) {}; \node [style=none] (105) at (13, -1) {}; \node [style=none] (106) at (16.5, -1) {}; \node [style=label] (107) at (11.75, -0.75) {$\eqref{eq:norm-on-states-wide}$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw (59.center) to (61.center); \draw (62.center) to (60.center); \draw (65.center) to (63.center); \draw [bend left=90, looseness=1.75] (62.center) to (67.center); \draw (68.center) to (67.center); \draw [norm] (70.center) to (69.center); \draw [norm] (72.center) to (70.center); \draw [norm] (71.center) to (72.center); \draw [norm] (71.center) to (69.center); \draw (94.center) to (96.center); \draw (97) to (95.center); \draw (99.center) to (98.center); \draw [norm] (105.center) to (103.center); \draw [norm] (103.center) to (104.center); \draw [norm] (104.center) to (106.center); \draw [norm] (106.center) to (105.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \end{definition} The following useful property generalises the \emph{chain rule} or \emph{product rule} \textcolor{black}{$P(x,y) = P(y | x) P(x)$} in probability theory, and for the case where $\omega$ is normalised has been referred to by Cho and Jacobs as the \emph{disintegration} of a joint state \citation{ChoEtAl_2019_DisintegrationViaStringDiagrams}. \begin{lemma}[Disintegration] If $\mathbf{C}$ has effect conditioning, any state $\omega$ of $X \otimes Y$ satisfies \begin{equation} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (-5, 0) {$\omega$}; \node [style=none] (1) at (-5.75, 0.25) {}; \node [style=none] (2) at (-4.25, 0.25) {}; \node [style=none] (3) at (-5.75, 1.25) {}; \node [style=none] (4) at (-4.25, 1.25) {}; \node [style=label] (7) at (-5.75, 1.75) {$X$}; \node [style=label] (9) at (-4.25, 1.75) {$Y$}; \node [style=none] (10) at (-2.5, 0) {$=$}; \node [style=whitedot] (12) at (-0.5, 0.25) {}; \node [style=map] (13) at (0.5, 1.5) {$\omega|_X$}; \node [style=none] (14) at (0.5, 1.75) {}; \node [style=none] (15) at (0.5, 1) {}; \node [style=none] (16) at (0.5, 2.75) {}; \node [style=none] (17) at (0.5, 1.25) {}; \node [style=label] (20) at (0.5, 3.25) {$Y$}; \node [style=label] (22) at (-1.5, 3.25) {$X$}; \node [style=none] (23) at (-1.5, 1) {}; \node [style=none] (24) at (-1.5, 2.75) {}; \node [style=medium map] (27) at (0, -1.5) {$\omega$}; \node [style=none] (29) at (0.5, -1.25) {}; \node [style=upground] (30) at (0.5, -0.25) {}; \node [style=none] (31) at (-0.5, -1.25) {}; \node [style=label] (32) at (1.25, -0.5) {$Y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (3.center); \draw (4.center) to (2.center); \draw (14.center) to (16.center); \draw (17.center) to (15.center); \draw [in=-90, out=0] (12) to (15.center); \draw [in=180, out=-90] (23.center) to (12); \draw (24.center) to (23.center); \draw (30) to (29.center); \draw (31.center) to (12); \end{pgfonlayer} \end{tikzpicture} \end{equation} \end{lemma} We prove an extension of this result in Appendix \ref{sec:normalisation-appendix}, which establishes in what sense our definition of $f|_Z$ satisfies the categorical definition of a 'conditional' from \textcolor{black}{Refs.~}\citation{ChoEtAl_2019_DisintegrationViaStringDiagrams, Fritz_2020_SyntheticApproachToMarkovKernels}\textcolor{black}{.} Note that this only holds when cancellative caps are present. The latter approaches provide a more general account of conditioning (for channels) which we do not use here. \begin{example} $\mathbf{Mat}_{\mathbb{R}^+}$ has effect conditioning as we have seen. For a morphism $M \colon X \to Y \otimes Z$ the partial channel $M|_Z$ is given by \begin{equation*} M|_Z(y \mid x, z) = \frac{M(y, z \mid x)}{\sum_{z' \in Z}M(y,z' \mid x)} \end{equation*} whenever the sum in the denominator is non-zero, and $M|_Z(y \mid x, z) = 0$ for all $y$ otherwise. In standard probability theory notation, a channel $M \colon X \to Y \otimes Z$ is described by a density $P(Y, Z \mid X)$, and $M|_Z$ corresponds to the conditional density $P(Y \mid X, Z)$. \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (0, 0) {$P(Y \mid X, Z)$}; \node [style=none] (1) at (0, 0.25) {}; \node [style=none] (2) at (0.75, -1.5) {}; \node [style=none] (3) at (0, 1.5) {}; \node [style=none] (4) at (0.75, -0.25) {}; \node [style=none] (5) at (-0.75, -0.25) {}; \node [style=label] (6) at (0, 2) {$Y$}; \node [style=none] (7) at (-0.75, -1.5) {}; \node [style=label] (8) at (-0.75, -2) {$X$}; \node [style=label] (9) at (0.75, -2) {$Z$}; \node [style=none] (10) at (3, 0) {$=$}; \node [style=medium map] (11) at (6.5, 0) {$P(Y, Z \mid X)$}; \node [style=none] (12) at (5.5, 0.25) {}; \node [style=none] (13) at (7.5, 0.25) {}; \node [style=none] (14) at (5.5, 2.25) {}; \node [style=none] (15) at (7.5, 0.5) {}; \node [style=none] (16) at (6.5, -0.25) {}; \node [style=label] (17) at (5.5, 2.75) {$Y$}; \node [style=none] (18) at (6.5, -2) {}; \node [style=label] (19) at (6.5, -2.5) {$X$}; \node [style=label] (20) at (9, -2.5) {$Z$}; \node [style=none] (21) at (9, 0.5) {}; \node [style=none] (22) at (9, -2) {}; \node [style=none] (23) at (4.25, 1.75) {}; \node [style=none] (24) at (9.75, 1.75) {}; \node [style=none] (25) at (4.25, -1.25) {}; \node [style=none] (26) at (9.75, -1.25) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (3.center); \draw (4.center) to (2.center); \draw (7.center) to (5.center); \draw (12.center) to (14.center); \draw (15.center) to (13.center); \draw (18.center) to (16.center); \draw [bend left=90, looseness=1.75] (15.center) to (21.center); \draw (22.center) to (21.center); \draw [norm] (24.center) to (23.center); \draw [norm] (26.center) to (24.center); \draw [norm] (25.center) to (26.center); \draw [norm] (25.center) to (23.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} For any $z \in Z$, the partial channel $M|_z$ corresponds to the conditional density $P(Y \mid X, Z=z)$. \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (0.25, 0) {$P(Y \mid X, Z=z)$}; \node [style=none] (1) at (0.25, 0.25) {}; \node [style=none] (3) at (0.25, 1.5) {}; \node [style=none] (5) at (0.25, -0.25) {}; \node [style=label] (6) at (0.25, 2) {$Y$}; \node [style=none] (7) at (0.25, -1.5) {}; \node [style=label] (8) at (0.25, -2) {$X$}; \node [style=none] (28) at (3.75, 0) {$=$}; \node [style=medium map] (29) at (7.25, 0) {$P(Y, Z \mid X)$}; \node [style=none] (30) at (6.25, 0.25) {}; \node [style=none] (31) at (8.25, 0.25) {}; \node [style=none] (32) at (6.25, 3) {}; \node [style=none] (33) at (8.25, 1.75) {}; \node [style=none] (34) at (7.25, -0.25) {}; \node [style=label] (35) at (6.25, 3.5) {$Y$}; \node [style=none] (36) at (7.25, -2) {}; \node [style=label] (37) at (7.25, -2.5) {$X$}; \node [style=label] (38) at (8.75, 1) {$Z$}; \node [style=none] (41) at (5, 2.5) {}; \node [style=none] (42) at (9.5, 2.5) {}; \node [style=none] (43) at (5, -1.25) {}; \node [style=none] (44) at (9.5, -1.25) {}; \node [style=sharpeffect] (45) at (8.25, 1.75) {$z$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (3.center); \draw (7.center) to (5.center); \draw (30.center) to (32.center); \draw (33.center) to (31.center); \draw (36.center) to (34.center); \draw [norm] (42.center) to (41.center); \draw [norm] (44.center) to (42.center); \draw [norm] (43.center) to (44.center); \draw [norm] (43.center) to (41.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} To see the relation of \ref{eq:chainrule} to the product rule in probability theory, observe that for any joint distribution $P(X,Y)$ we have the following. \begin{equation*} P(x,y) \ \ = \ \ \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (-5.25, 0) {$P$}; \node [style=none] (1) at (-6, 0.25) {}; \node [style=none] (2) at (-4.5, 0.25) {}; \node [style=none] (3) at (-6, 1.25) {}; \node [style=none] (4) at (-4.5, 1.25) {}; \node [style=none] (10) at (-3, 0) {$=$}; \node [style=whitedot] (12) at (-0.5, 0.25) {}; \node [style=map] (13) at (0.75, 1.75) {$P(Y | X)$}; \node [style=none] (14) at (0.75, 2) {}; \node [style=none] (15) at (0.75, 1.25) {}; \node [style=none] (16) at (0.75, 3) {}; \node [style=none] (17) at (0.75, 1.5) {}; \node [style=none] (23) at (-1.75, 1.25) {}; \node [style=none] (24) at (-1.75, 3) {}; \node [style=medium map] (27) at (0, -1.5) {$P$}; \node [style=none] (29) at (0.5, -1.25) {}; \node [style=upground] (30) at (0.5, -0.25) {}; \node [style=none] (31) at (-0.5, -1.25) {}; \node [style=label] (32) at (1.25, -0.5) {$Y$}; \node [style=sharpeffect] (33) at (-6, 1.5) {$x$}; \node [style=sharpeffect] (34) at (-4.5, 1.5) {$y$}; \node [style=sharpeffect] (36) at (-1.75, 3) {$x$}; \node [style=sharpeffect] (37) at (0.75, 3) {$y$}; \node [style=none] (38) at (2.75, 0) {$=$}; \node [style=map] (40) at (8.5, 0) {$P(Y | X)$}; \node [style=none] (41) at (8.5, 0.25) {}; \node [style=none] (42) at (8.5, -1.25) {}; \node [style=none] (43) at (8.5, 1.25) {}; \node [style=none] (44) at (8.5, -0.25) {}; \node [style=none] (45) at (5, 0.25) {}; \node [style=none] (46) at (5, 1.25) {}; \node [style=medium map] (47) at (5.5, 0) {$P$}; \node [style=none] (48) at (6, 0.25) {}; \node [style=upground] (49) at (6, 1.25) {}; \node [style=none] (50) at (5, 0.25) {}; \node [style=label] (51) at (6.75, 1) {$Y$}; \node [style=sharpeffect] (52) at (5, 1.5) {$x$}; \node [style=sharpeffect] (53) at (8.5, 1.25) {$y$}; \node [style=sharpstate] (54) at (8.5, -1.25) {$x$}; \node [style=none] (55) at (2.75, -0.75) {$ \eqref{eq:sharp-state-eff}$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (1.center) to (3.center); \draw (4.center) to (2.center); \draw (14.center) to (16.center); \draw (17.center) to (15.center); \draw [in=-90, out=0] (12) to (15.center); \draw [in=180, out=-90] (23.center) to (12); \draw (24.center) to (23.center); \draw (30) to (29.center); \draw (31.center) to (12); \draw (41.center) to (43.center); \draw (44.center) to (42.center); \draw (46.center) to (45.center); \draw (49) to (48.center); \end{pgfonlayer} \end{tikzpicture} \ \ \ \ = \ \ P(x) P(y \mid x) \end{equation*} \end{example} \color{black} \begin{remark} The significance of partial channels in the definition of normalisation is that it allows our conditionals to be defined even for distributions without full support, which are important for various aspects of causal reasoning. As above, in $\mathbf{Mat}_{\mathbb{R}^+}$ a conditional $P(X | Z=z)$ outputs the zero state whenever $P(Z=z) = 0$, interpreted as an 'undefined' distribution over $X$, and otherwise yields the expected conditional distribution. \end{remark} We will now establish that conditionals behave in keeping with our intuitions, with all proofs found in Appendix \ref{sec:cond-appendix}. \begin{proposition} Let $\mathbf{C}$ be a cd-category with effect conditioning and $\omega$ a state of $X \otimes Y \otimes Z$. Then \begin{equation} \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (1.25, 0) {$ \ \omega |_{Z \otimes Y} \ $}; \node [style=none] (1) at (1.25, 0.5) {}; \node [style=none] (2) at (1.25, 1.25) {}; \node [style=none] (3) at (4.25, 0) {$=$}; \node [style=label] (4) at (1.25, 1.75) {$X$}; \node [style=none] (5) at (0.5, -0.5) {}; \node [style=none] (6) at (0.5, -1.25) {}; \node [style=label] (7) at (0.5, -1.75) {$Z$}; \node [style=none] (8) at (2, -0.5) {}; \node [style=none] (9) at (2, -1.25) {}; \node [style=label] (10) at (2, -1.75) {$Y$}; \node [style=medium map] (11) at (7, 0) {$ \ (\omega |_{ Z}) |_{Y} \ $}; \node [style=none] (12) at (7, 0.5) {}; \node [style=none] (13) at (7, 1.25) {}; \node [style=label] (14) at (7, 1.75) {$X$}; \node [style=none] (15) at (6.25, -0.5) {}; \node [style=none] (16) at (6.25, -1.25) {}; \node [style=none] (17) at (7.75, -0.5) {}; \node [style=none] (18) at (7.75, -1.25) {}; \node [style=label] (19) at (6.25, -1.75) {$Z$}; \node [style=label] (20) at (7.75, -1.75) {$Y$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw (6.center) to (5.center); \draw (9.center) to (8.center); \draw (13.center) to (12.center); \draw (16.center) to (15.center); \draw (18.center) to (17.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} \end{proposition} This, together with~\ref{eq:disc-cond}, means that relative to a given state $\omega$ one can drop the labels in boxes of conditionals -- their identity is unambiguously determined by the input and output types. Whenever the context makes the reference state $\omega$ clear, it is therefore convenient to employ the following notation. Given a state $\omega$ of $X_1 \otimes \dots \otimes X_n$ for any disjoint subsets $S,T \subseteq \{ X_1, ..., X_n\}$ and $R:= \{ X_1, ..., X_n\} \setminus (S \cup T)$ we write: \begin{equation} \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (-4, 0) {$ \hspace*{1.0cm} $}; \node [style=none] (7) at (-1.25, 0) {$:=$}; \node [style=none] (34) at (-4, -1) {$\dots$}; \node [style=none] (35) at (-4.75, -0.5) {}; \node [style=none] (36) at (-4.75, -1.25) {}; \node [style=none] (37) at (-3.25, -0.5) {}; \node [style=none] (38) at (-3.25, -1.25) {}; \node [style=none] (39) at (-4, 1) {$\dots$}; \node [style=none] (40) at (-4.75, 1.25) {}; \node [style=none] (41) at (-4.75, 0.5) {}; \node [style=none] (42) at (-3.25, 1.25) {}; \node [style=none] (43) at (-3.25, 0.5) {}; \node [style=map] (44) at (2.75, 0) {$ \hspace*{0.9cm} \omega |_{S} \hspace*{0.9cm} $}; \node [style=none] (45) at (2.75, -1) {$\dots$}; \node [style=none] (46) at (2, -0.5) {}; \node [style=none] (47) at (2, -1.25) {}; \node [style=none] (48) at (3.5, -0.5) {}; \node [style=none] (49) at (3.5, -1.25) {}; \node [style=none] (50) at (1.5, 1.5) {$\dots$}; \node [style=none] (51) at (0.75, 1.75) {}; \node [style=none] (52) at (0.75, 0.5) {}; \node [style=none] (53) at (2.25, 1.75) {}; \node [style=none] (54) at (2.25, 0.5) {}; \node [style=none] (55) at (4, 0.75) {$\dots$}; \node [style=upground] (60) at (3.25, 1.25) {}; \node [style=none] (62) at (3.25, 0.5) {}; \node [style=upground] (63) at (4.75, 1.25) {}; \node [style=none] (64) at (4.75, 0.5) {}; \node [style=none] (65) at (1.5, 2.5) {$\overbrace{\hspace*{1.0cm}}^{T}$}; \node [style=none] (66) at (-4, 2) {$\overbrace{\hspace*{1.0cm}}^{T}$}; \node [style=none] (67) at (2.75, -2) {$\underbrace{\hspace*{1.0cm}}_{S}$}; \node [style=none] (68) at (-4, -2) {$\underbrace{\hspace*{1.0cm}}_{S}$}; \node [style=none] (69) at (4, 2) {$\overbrace{\hspace*{1.0cm}}^{R}$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (36.center) to (35.center); \draw (38.center) to (37.center); \draw (41.center) to (40.center); \draw (43.center) to (42.center); \draw (47.center) to (46.center); \draw (49.center) to (48.center); \draw (52.center) to (51.center); \draw (54.center) to (53.center); \draw (62.center) to (60); \draw (64.center) to (63); \end{pgfonlayer} \end{tikzpicture} \end{equation} In particular we say that $\omega$ has \emph{full support} over $S \subseteq \{ X_1,\dots,X_n\}$ when we have: \begin{equation*} \begin{tikzpicture}[baseline=-0.5em]{\node[draw=red,font=\color{red},fill=red!10!white] {\textit{full-supps}};}\end{tikzpicture} \end{equation*} \textcolor{black}{Even when $\omega$ lacks full support on $S$, this box induces a morphism on $S$ that leaves all conditionals invariant under composition, as follows.} \begin{lemma} Let $\mathbf{C}$ be a cd-category with effect conditioning and $\omega$ a state of $X \otimes Y$. Then the following hold. \begin{center} (1) $\begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (-2, -1.5) {$X$}; \node [style=none] (1) at (-2, 2) {}; \node [style=none] (2) at (-2, 0) {}; \node [style=none] (3) at (-1, 1) {}; \node [style=none] (4) at (-2, 0) {}; \node [style=whitedot] (5) at (-2, 0) {}; \node [style=none] (6) at (-2, 0) {}; \node [style=none] (7) at (-2, -1) {}; \node [style=map] (8) at (-1, 1.5) {$ \ $}; \node [style=none] (9) at (-2, 2.5) {$X$}; \node [style=none] (10) at (0.25, 0) {$=$}; \node [style=none] (11) at (2, -2.5) {$X$}; \node [style=none] (12) at (2, 3) {}; \node [style=none] (13) at (2, -1) {}; \node [style=none] (14) at (3, 0) {}; \node [style=none] (15) at (2, -1) {}; \node [style=whitedot] (16) at (2, -1) {}; \node [style=none] (17) at (2, -1) {}; \node [style=none] (18) at (2, -2) {}; \node [style=map] (19) at (3, 0.5) {$ \ $}; \node [style=none] (20) at (2, 3.5) {$X$}; \node [style=none] (21) at (2, 1) {}; \node [style=none] (22) at (3, 2) {}; \node [style=none] (23) at (2, 1) {}; \node [style=whitedot] (24) at (2, 1) {}; \node [style=none] (25) at (2, 1) {}; \node [style=map] (26) at (3, 2.5) {$ \ $}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw [in=-90, out=60] (4.center) to (3.center); \draw (7.center) to (6.center); \draw (13.center) to (12.center); \draw [in=-90, out=60] (15.center) to (14.center); \draw (18.center) to (17.center); \draw [in=-90, out=60] (23.center) to (22.center); \end{pgfonlayer} \end{tikzpicture}$ \qquad (2) $\begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (5) at (4, 0.5) {}; \node [style=none] (6) at (4, 1.25) {}; \node [style=none] (10) at (2.75, 0) {$=$}; \node [style=none] (28) at (4, -1.75) {$X$}; \node [style=none] (71) at (4, -0.5) {}; \node [style=none] (72) at (4, -1.25) {}; \node [style=label] (73) at (4, 1.75) {$Y$}; \node [style=none] (75) at (0, 1) {}; \node [style=none] (76) at (0, 1.75) {}; \node [style=none] (77) at (0, -2.25) {$X$}; \node [style=none] (78) at (0, 0) {}; \node [style=none] (79) at (0, -1) {}; \node [style=label] (80) at (0, 2.25) {$Y$}; \node [style=none] (81) at (1.5, 0) {}; \node [style=none] (82) at (0, -1) {}; \node [style=whitedot] (83) at (0, -1) {}; \node [style=none] (85) at (0, -1) {}; \node [style=none] (86) at (0, -1.75) {}; \node [style=map] (89) at (0, 0.5) {$ \ $}; \node [style=map] (90) at (1.5, 0.5) {$ \ $}; \node [style=map] (91) at (4, 0) {$ \ $}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (6.center) to (5.center); \draw (72.center) to (71.center); \draw (76.center) to (75.center); \draw (79.center) to (78.center); \draw [in=-90, out=60] (82.center) to (81.center); \draw (86.center) to (85.center); \end{pgfonlayer} \end{tikzpicture}$ \qquad (3) $\begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (81) at (-0.5, 0.25) {}; \node [style=none] (82) at (-1, -0.75) {}; \node [style=whitedot] (83) at (-1, -0.75) {}; \node [style=map] (90) at (-0.5, 0.75) {$ \ $}; \node [style=map] (93) at (3.75, -1) {$ \hspace*{1cm} $}; \node [style=none] (94) at (1.75, 0) {$=$}; \node [style=none] (95) at (3, 0.25) {}; \node [style=none] (96) at (3, -0.75) {}; \node [style=none] (97) at (3, -0.75) {}; \node [style=none] (98) at (4.5, 0.25) {}; \node [style=none] (99) at (4.5, -0.75) {}; \node [style=none] (100) at (4.5, -0.75) {}; \node [style=none] (101) at (3, 0.75) {$X$}; \node [style=none] (102) at (4.5, 0.75) {$Y$}; \node [style=map] (103) at (-0.25, -2) {$ \hspace*{1cm} $}; \node [style=none] (104) at (-1, -0.75) {}; \node [style=none] (105) at (-1, -1.75) {}; \node [style=none] (106) at (-1, -1.75) {}; \node [style=none] (107) at (0.75, 1.5) {}; \node [style=none] (108) at (0.5, -1.75) {}; \node [style=none] (109) at (0.5, -1.75) {}; \node [style=none] (110) at (-1.5, 2) {$X$}; \node [style=none] (111) at (0.75, 2) {$Y$}; \node [style=none] (112) at (-1.5, 1.5) {}; \node [style=none] (113) at (-1, -0.75) {}; \node [style=none] (114) at (-1, -0.75) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [in=-90, out=60] (82.center) to (81.center); \draw (96.center) to (95.center); \draw (99.center) to (98.center); \draw (105.center) to (104.center); \draw [in=270, out=90] (108.center) to (107.center); \draw [in=-90, out=120] (113.center) to (112.center); \end{pgfonlayer} \end{tikzpicture}$ . \end{center} \end{lemma} Note that $X$ may of course have the structure of a product over a subset $S$ of objects as above. We can now define conditional independence, which lets the same idea as in Refs.~\citation{ChoEtAl_2019_DisintegrationViaStringDiagrams, Fritz_2020_SyntheticApproachToMarkovKernels} manifest in terms of conditionals in the above sense. \begin{definition} (Conditional independence): Let $\mathbf{C}$ be a cd-category with effect conditioning. Given a state $\omega$ of $X \otimes Y \otimes Z$, we say $X$ is independent from $Y$ relative to $Z$, and write $(X \mathrel{\text{$\perp\mkern-10mu\perp$}} Y | Z)_{\omega}$, iff \begin{equation*} \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (16) at (3, -1.5) {}; \node [style=map] (20) at (-2.75, 0) {$ \hspace*{1.0cm} $}; \node [style=none] (21) at (-2, 0.25) {}; \node [style=none] (22) at (-2, 1.25) {}; \node [style=label] (28) at (-3.5, 1.5) {$X$}; \node [style=none] (48) at (3, -1.5) {}; \node [style=none] (49) at (2, -0.5) {}; \node [style=none] (52) at (-3.5, 0.25) {}; \node [style=none] (53) at (-3.5, 1.25) {}; \node [style=none] (57) at (-2.75, -1.25) {}; \node [style=none] (58) at (-2.75, -0.25) {}; \node [style=label] (59) at (-2, 1.5) {$Y$}; \node [style=label] (60) at (-2.75, -1.75) {$Z$}; \node [style=none] (61) at (0, 0) {$=$}; \node [style=none] (62) at (3, -2.25) {}; \node [style=none] (63) at (3, -1.5) {}; \node [style=label] (64) at (3, -2.75) {$Z$}; \node [style=none] (72) at (3, -1.5) {}; \node [style=none] (73) at (4, -0.5) {}; \node [style=label] (77) at (2, 1.5) {$X$}; \node [style=none] (78) at (2, 0.25) {}; \node [style=none] (79) at (2, 1.25) {}; \node [style=none] (82) at (4, 0.25) {}; \node [style=none] (83) at (4, 1.25) {}; \node [style=label] (87) at (4, 1.5) {$Y$}; \node [style=map] (94) at (2, 0) {}; \node [style=map] (95) at (4, 0) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (22.center) to (21.center); \draw [in=150, out=-90] (49.center) to (48.center); \draw [in=90, out=-90] (53.center) to (52.center); \draw (58.center) to (57.center); \draw (63.center) to (62.center); \draw [in=30, out=-90] (73.center) to (72.center); \draw [in=90, out=-90] (79.center) to (78.center); \draw (83.center) to (82.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} \end{definition} It is instructive to observe the following equivalences. \begin{lemma} Let $\mathbf{C}$ be a cd-category with effect conditioning. Given a state $\omega$ of $X \otimes Y \otimes Z$, conditional independence $(X \mathrel{\text{$\perp\mkern-10mu\perp$}} Y | Z)_{\omega}$ is equivalent to the following: \begin{center} (1) $\begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (-4, -0.75) {$ \hspace*{1.5cm} $}; \node [style=none] (1) at (-2.75, -0.5) {}; \node [style=none] (2) at (-2.75, 0.5) {}; \node [style=label] (3) at (-5.25, 0.75) {$X$}; \node [style=none] (4) at (-5.25, -0.5) {}; \node [style=none] (5) at (-5.25, 0.5) {}; \node [style=label] (6) at (-2.75, 0.75) {$Y$}; \node [style=none] (7) at (-1, 0) {$=$}; \node [style=none] (8) at (-4, -0.5) {}; \node [style=none] (9) at (-4, 0.5) {}; \node [style=label] (10) at (-4, 0.75) {$Z$}; \node [style=whitedot] (11) at (2, -0.75) {}; \node [style=none] (12) at (2, -0.75) {}; \node [style=none] (13) at (2, 2) {}; \node [style=map] (14) at (3.25, 0.75) {}; \node [style=none] (15) at (3.25, 1) {}; \node [style=none] (16) at (3.25, 2) {}; \node [style=label] (17) at (3.25, 2.25) {$Y$}; \node [style=none] (18) at (2, -0.75) {}; \node [style=none] (19) at (3.25, 0.25) {}; \node [style=map] (20) at (2, -2) {}; \node [style=label] (21) at (0.75, 2.25) {$X$}; \node [style=none] (24) at (2, -1.75) {}; \node [style=none] (25) at (2, -0.75) {}; \node [style=label] (26) at (2, 2.25) {$Z$}; \node [style=map] (29) at (0.75, 0.75) {}; \node [style=none] (30) at (0.75, 1) {}; \node [style=none] (31) at (0.75, 2) {}; \node [style=none] (32) at (2, -0.75) {}; \node [style=none] (33) at (0.75, 0.25) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw [in=90, out=-90] (5.center) to (4.center); \draw (9.center) to (8.center); \draw [in=90, out=-90] (13.center) to (12.center); \draw (16.center) to (15.center); \draw [in=30, out=-90] (19.center) to (18.center); \draw (25.center) to (24.center); \draw (31.center) to (30.center); \draw [in=150, out=-90] (33.center) to (32.center); \end{pgfonlayer} \end{tikzpicture}$ (2) $\begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=whitedot] (16) at (2.25, -0.5) {}; \node [style=none] (21) at (-2.25, -0.5) {}; \node [style=none] (22) at (-2.25, 0.5) {}; \node [style=label] (28) at (-4.75, 0.75) {$X$}; \node [style=none] (52) at (-4.75, -0.5) {}; \node [style=none] (53) at (-4.75, 0.5) {}; \node [style=label] (59) at (-2.25, 0.75) {$Y$}; \node [style=none] (61) at (-0.75, 0) {$=$}; \node [style=none] (62) at (2.25, -0.5) {}; \node [style=none] (63) at (2, 2.5) {}; \node [style=map] (65) at (3.25, 1.25) {}; \node [style=none] (66) at (3.25, 1.5) {}; \node [style=none] (67) at (3.25, 2.5) {}; \node [style=label] (71) at (3.25, 2.75) {$Y$}; \node [style=none] (72) at (2.25, -0.5) {}; \node [style=none] (73) at (3.25, 0.75) {}; \node [style=none] (74) at (-3.5, -0.5) {}; \node [style=none] (75) at (-3.5, 0.5) {}; \node [style=label] (76) at (-3.5, 0.75) {$Z$}; \node [style=map] (77) at (1.75, -2) {$ \hspace*{0.9cm} $}; \node [style=label] (80) at (0.75, 2.75) {$X$}; \node [style=none] (81) at (1.25, -1.75) {}; \node [style=none] (82) at (0.75, 2.5) {}; \node [style=none] (84) at (2.25, -1.75) {}; \node [style=none] (85) at (2.25, -0.5) {}; \node [style=label] (86) at (2, 2.75) {$Z$}; \node [style=map] (91) at (-3.5, -0.75) {$ \hspace*{1.5cm} $}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (22.center) to (21.center); \draw [in=90, out=-90] (53.center) to (52.center); \draw [in=90, out=-90] (63.center) to (62.center); \draw (67.center) to (66.center); \draw [in=30, out=-90] (73.center) to (72.center); \draw (75.center) to (74.center); \draw [in=90, out=-90] (82.center) to (81.center); \draw (85.center) to (84.center); \end{pgfonlayer} \end{tikzpicture}$ (3) $\begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (1) at (1.75, 0) {}; \node [style=map] (20) at (-2.75, 0) {\hspace*{0.9cm}}; \node [style=label] (28) at (-2.75, 1.5) {$X$}; \node [style=none] (52) at (-2.75, 0.25) {}; \node [style=none] (53) at (-2.75, 1.25) {}; \node [style=none] (57) at (-3.5, -1.25) {}; \node [style=none] (58) at (-3.5, -0.25) {}; \node [style=label] (60) at (-3.5, -1.75) {$Z$}; \node [style=none] (61) at (-0.25, 0) {$=$}; \node [style=none] (62) at (2.5, -2.5) {}; \node [style=none] (63) at (2.5, -1.75) {}; \node [style=label] (64) at (2.5, -3) {$Z$}; \node [style=none] (66) at (4.25, -2) {}; \node [style=none] (67) at (4.25, -0.25) {}; \node [style=label] (68) at (1.75, 1.5) {$X$}; \node [style=none] (69) at (1.75, 0.25) {}; \node [style=none] (70) at (1.75, 1.25) {}; \node [style=label] (71) at (4.25, -2.5) {$Y$}; \node [style=none] (73) at (-2, -1.25) {}; \node [style=none] (74) at (-2, -0.25) {}; \node [style=label] (75) at (-2, -1.75) {$Y$}; \node [style=map] (77) at (3.75, 0) {\hspace*{0.7cm}}; \node [style=none] (78) at (2.5, -1.75) {}; \node [style=none] (79) at (1.75, -0.5) {}; \node [style=whitedot] (80) at (2.5, -1.75) {}; \node [style=none] (81) at (2.5, -1.75) {}; \node [style=none] (82) at (3.25, -0.5) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [in=90, out=-90] (53.center) to (52.center); \draw (58.center) to (57.center); \draw (63.center) to (62.center); \draw (67.center) to (66.center); \draw [in=90, out=-90] (70.center) to (69.center); \draw (74.center) to (73.center); \draw [in=120, out=-90] (79.center) to (78.center); \draw [in=60, out=-90] (82.center) to (81.center); \end{pgfonlayer} \end{tikzpicture}$ $\stackrel{\begin{minipage} {\scriptsize in case of full support over $Y, Z$} \end{minipage}}{=}$ $\begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=map] (0) at (-0.75, 0) {}; \node [style=none] (1) at (-0.75, -1.25) {}; \node [style=none] (2) at (-0.75, -0.25) {}; \node [style=label] (3) at (-0.75, -1.75) {$Z$}; \node [style=none] (4) at (0.75, -1.25) {}; \node [style=label] (5) at (-0.75, 1.5) {$X$}; \node [style=none] (6) at (-0.75, 0.25) {}; \node [style=none] (7) at (-0.75, 1.25) {}; \node [style=label] (8) at (0.75, -1.75) {$Y$}; \node [style=upground] (9) at (0.75, 0) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw [in=90, out=-90] (7.center) to (6.center); \draw (4.center) to (9); \end{pgfonlayer} \end{tikzpicture}$ \end{center} \end{lemma} Similarly to the treatment of the full support case in \citation{ChoEtAl_2019_DisintegrationViaStringDiagrams}, we are able to establish string diagrammatic forms of the well-known \emph{semi-graphoid axioms} for reasoning about conditional independence, recalled and proven in Appendix~\ref{App_Proof_Thm_SG_axioms}. \begin{theorem} Let $\mathbf{C}$ be a cd-category with effect conditioning and $\omega$ some state. The conditional independence relation $( \_\_ \mathrel{\text{$\perp\mkern-10mu\perp$}} \_\_ | \_\_ )_{\omega}$ from Def. \ref{def:cond_ind}, understood as a 3-place relation for arbitrary triples of disjoint subsets (of the objects of $\omega$'s codomain) satisfies the semi-graphoid axioms. \end{theorem} \begin{example} In $\mathbf{Mat}_{\mathbb{R}^+}$ the condition $(X \mathrel{\text{$\perp\mkern-10mu\perp$}} Y | Z)_{P}$ for some $P(X,Y,Z)$ according to Def. \ref{def:cond_ind} indeed reads \begin{equation*} P(X,Y | Z) \ = \ P(X | Z) \ P(Y | Z) \ , \end{equation*} understood as $P(X,Y | Z=z) = P(X | Z=z) P(Y | Z=z)$ whenever $P(Z=z) \neq 0$ and it is set to the zero state in $\mathbf{Mat}_{\mathbb{R}^+}$ for all other $z$. Conditions $(1)$, and $(2)$ of Lem. \ref{Lem_CI_equivalences}, respectively, are: \begin{equation*} P(X,Y,Z) \ = \ P(Y | Z) \ P(X|Z) \ P(Z) \quad \text{and} \quad P(X,Y,Z) \ = \ P(Y | Z) \ P(X,Z) \ , \end{equation*} both understood as above concerning the dependence on full support. Condition $(3)$, in the full support case, reads \begin{equation*} P(Y| X, Z) \ = \ P(Y | Z) \ . \end{equation*} \end{example} \color{black} \section{Causal models} \subsection{Definition} Let us now turn to the main object of study of this work and give the general definition of a causal model in a cd-category. Recall that a finite \emph{directed graph} $G=(V,E)$ consists of a finite set $V=\{ X_1,\dots,X_n\}$ of \emph{vertices} and a subset $E \subseteq V \times V$ of directed \emph{edges}. An edge $(X_i,X_j) \in E$ is usually denoted $X_i \to X_j$. A finite \emph{directed acyclic graph (DAG)} furthermore has no directed cycles, i.e. no sequence of edges of the form $X_{i_1} \to X_{i_2} \to \dots \to X_{i_n} \to X_{i_1}$ of length $n \geq 1$. \textcolor{black}{For a vertex $X \in V$ write $\mathrm{Pa}(X)\subseteq V$ and $\mathrm{Ch}(X) \subseteq V$ for the sets of parents and children of $X$, respectively.} A causal model is essentially given by specifying a DAG, whose edges describe direct-cause relations between variables, along with channels that describe the mechanisms by which each variable is caused by its parents in the DAG. \begin{definition} Let $\mathbf{C}$ be a cd-category. A causal model $\mathbb{M}$ in $\mathbf{C}$ is given by: \begin{enumerate} \item a finite DAG $G$ with vertices $V=\{ X_1,\dots,X_n\}$; \item a specified subset $O \subseteq V$ of output vertices; \item for each vertex $X_i \in V$ an associated object of $\mathbf{C}$ also denoted $X_i$, and a channel in $\mathbf{C}$ of the form: \begin{equation} \begin{tikzpicture}[tikzfig] \begin{pgfonlayer}{nodelayer} \node [style=medium map] (0) at (1.5, 0) {$c_i$}; \node [style=none] (1) at (1.5, 0.5) {}; \node [style=none] (2) at (1.5, 1) {}; \node [style=none] (3) at (0.5, -0.5) {}; \node [style=none] (4) at (2.5, -0.5) {}; \node [style=none] (5) at (0.5, -1.25) {}; \node [style=none] (6) at (2.5, -1.25) {}; \node [style=label] (7) at (1.5, 1.5) {$X_i$}; \node [style=label] (8) at (1.5, -1.75) {$\Pa(X_i)$}; \node [style=label] (9) at (1.5, -1) {$\dots$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (2.center) to (1.center); \draw (6.center) to (4.center); \draw (5.center) to (3.center); \end{pgfonlayer} \end{tikzpicture} \end{equation} where $\mathrm{Pa}(X_i)$ denotes the parents of $X_i$ in $G$. We call the channel $c_i$ the mechanism for $X_i$. \end{enumerate} The pair ${G}_\mathbb{M} = (G,O)$ is called the causal structure of $\mathbb{M}$. The indexed objects $\mathop{\mathrm{var}}\nolimits(\mathbb{M}) = (X_i)^n_{i=1}$ of $\mathbf{C}$ are called the variables of the model. The variables corresponding to output vertices are called the output variables and denoted $\mathop{\mathrm{out}}\nolimits(\mathbb{M}) \subseteq \mathop{\mathrm{var}}\nolimits(\mathbb{M})$. \end{definition} Note that this definition allows for only a subset of variables to be considered as 'outputs' of the model. This specification does not necessarily come with the interpretation that these are the only 'observed' variables; it is a piece of data that is useful for different purposes. Rather than specify the DAG directly, we can also simply specify a causal model via its collection of mechanisms. More precisely, we can equivalently define a causal model $\mathbb{M}$ in $\mathbf{C}$ as consisting of an indexed collection of objects $\mathop{\mathrm{var}}\nolimits(\mathbb{M}) = (X_i)^n_{i=1}$ with a subset $\mathop{\mathrm{out}}\nolimits(\mathbb{M})$, and a collection of channels $(c_i)^n_{i=1}$ as in \ref{eq:channels}, such that the directed graph $G$ with vertices $X_i$ and an edge $X_i \to X_j$ whenever $X_i$ is an input to $c_j$ is acyclic. When we take our category to describe probabilistic mappings between finite sets, we immediately obtain the more standard notion of a causal model. \begin{example} A Causal Bayesian Network (CBN) is a causal model in $\mathbf{C} = \mathbf{FStoch}$. Thus it consists of: \begin{itemize} \item a finite DAG $G=(V,E)$ with a subset $O \subseteq V$ of outputs; \item for each vertex $X_i$ an associated variable with a finite set of values also denoted $X_i$, and a mechanism $c_i$ given by a probability channel with density \begin{equation} P(X_i \mid \mathrm{Pa}(X_i)) \end{equation} \end{itemize} A CBN defines a joint distribution over all the variables with density \begin{equation} P(X_1,\dots,X_n) := \prod^n_{i=1} P(X_i \mid \mathrm{Pa}(X_i)) \end{equation} As the notation suggests, the mechanisms can often be thought of as conditional densities for $P(X_1,\dots,X_n)$, though we emphasise that they are given channels -- see Rem. \ref{Rem_MechanismsFirst}. In general any joint density $P$ is said to satisfy the causal Markov condition for $G$ when there exist probability channels such that \ref{eq:markov} holds. Often one is particularly interested in the output distribution of the CBN, given by the marginal distribution over the output variables $O$. \end{example} \begin{remark} It is perhaps more common to define a CBN as given by a DAG $G$ and corresponding variables, along with a joint distribution $P(X_1,\dots,X_n)$ that satisfies the Markov condition \ref{eq:markov}, where one defines the mechanisms as corresponding conditional densities \ref{eq:mechanism-density}. However, these conditional channels are not uniquely determined if the joint distribution lacks full support, and their precise choice can make a practical difference when considering, for instance, interventions (Sec. \ref{Sec:Interventions}). In contrast, we consider the specific channels $c_i$ to be the key aspect of a causal model, since they come with an interpretation as causal mechanisms. We therefore take a CBN to be defined by its mechanisms rather than the joint distribution. \end{remark} \begin{example} As a basic, more concrete example of a CBN, which will also serve as a running example throughout, consider the following example, taken from Ref. \citation{CorreaEtAl_2020_CalculusForStochasticInterventions}. Similarly to the example in Sec. \ref{Sec_GentleOverview}, it is about smoking and lung cancer, though this time with different further variables that will be relevant to studying policy making later. Let $S$ be a person's choice to smoke, $L$ whether or not they develop lung cancer, $A$ their age and $B$ a set of relevant background conditions like socio-economic status, education etc. A plausible causal model consists of the DAG below, where the vertices corresponding to output variables are circled, along with the specification of each of the probability channels listed to the right, which give the mechanisms. \begin{align} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (1) at (0, -0.25) {$B$}; \node [style=none] (6) at (-1.25, 1.75) {}; \node [style=none] (7) at (-0.25, 0.25) {}; \node [style=none] (10) at (0, -2) {}; \node [style=none] (11) at (0, -0.75) {}; \node [style=none] (13) at (0.75, 3.5) {}; \node [style=none] (14) at (-0.75, 2.75) {}; \node [style=none] (15) at (1, 3.25) {}; \node [style=none] (16) at (0.25, 0.25) {}; \node [style=none] (17) at (1.5, 3.5) {}; \node [style=none] (18) at (0.5, -2.25) {}; \node [style=whitedot] (20) at (1.25, 4) {$L$}; \node [style=whitedot] (21) at (-1.5, 2.25) {$S$}; \node [style=whitedot] (22) at (0, -2.75) {$A$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [style={arrow=.99}] (7.center) to (6.center); \draw [style={arrow=.99}] (10.center) to (11.center); \draw [style={arrow=.99}] (14.center) to (13.center); \draw [style={arrow=.99}] (16.center) to (15.center); \draw [style={arrow=.99}, bend right] (18.center) to (17.center); \end{pgfonlayer} \end{tikzpicture} \begin{minipage} $P(L | SBA)$ $P(S | B)$ $P(B | A)$ $P(A)$ \end{minipage} \end{align} As emphasised in Rem. \ref{Rem_MechanismsFirst}, the CBN is not taken to be defined by conditional distributions obtained from a joint distribution, but by causal mechanisms. This data though of course defines a marginal distribution over the output variables: $$P(SLA) = \sum_{B} \ P(L | SBA) \ P(S | B) \ P(B | A) \ P(A)$$ \end{example} \subsection{Causal models as diagrams} Let us now see how to describe CBNs and more general kinds of causal models in diagrams. In fact there is an exact correspondence between DAGs and a certain class of string diagrams which lies at the heart of the diagrammatic approach to causal models. This construction was first given by Fong \citation{Fong_2013_CausalTheories} and made still more explicit by Jacobs et al.~\citation{JacobsEtAl_2019_CausalInferenceByDiagramSurgery}. \begin{definition} Let $G$ be a DAG over vertices $V=\{ X_1,\dots,X_n\}$, and $O \subseteq V$ a subset of output vertices. We define a string diagram $D_{G,O}$ with no inputs and a single output for each $X_i \in O$, as follows. \begin{enumerate} \item For each vertex $X_i$ we draw a box denoted $c_i$ from the parents $Y_1,\dots,Y_m$ of $X_i$ to $X_i$ as below. We then copy the output of the box $k + 1$ times if $X_i$ is an output variable, or $k$ times otherwise, where $k$ is the number of children of $X_i$. \begin{equation*} \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=label] (0) at (0, 0) {}; \node [style=label] (1) at (0, 0) {$X_i$}; \node [style=label] (4) at (-2, -2.5) {$Y_1$}; \node [style=label] (5) at (2, -2.5) {$Y_m$}; \node [style=none] (6) at (-2, -2) {}; \node [style=none] (7) at (-0.5, -0.5) {}; \node [style=none] (8) at (0.5, -0.5) {}; \node [style=none] (9) at (2, -2) {}; \node [style=none] (10) at (-2, 2) {}; \node [style=none] (11) at (-0.5, 0.25) {}; \node [style=none] (12) at (0.5, 0.25) {}; \node [style=none] (13) at (2, 2) {}; \node [style=none] (14) at (0, 1.5) {$\dots$}; \node [style=none] (15) at (0, -1.5) {$\dots$}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw [style={arrow=.99}] (11.center) to (10.center); \draw [style={arrow=.99}] (12.center) to (13.center); \draw [style={arrow=.99}] (6.center) to (7.center); \draw [style={arrow=.99}] (9.center) to (8.center); \end{pgfonlayer} \end{tikzpicture} \qquad \mapsto \qquad \begin{tikzpicture} \begin{pgfonlayer}{nodelayer} \node [style=none] (0) at (0, 0.25) {}; \node [style=medium map] (2) at (0, -0.25) {$c_i$}; \node [style=label] (3) at (-0.75, -2.5) {$Y_1$}; \node [style=label] (4) at (0.75, -2.5) {$Y_m$}; \node [style=none] (5) at (-0.75, -2) {}; \node [style=none] (6) at (-0.75, -0.5) {}; \node [style=none] (7) at (0.75, -0.5) {}; \node [style=none] (8) at (0.75, -2) {}; \node [style=none] (13) at (0, 2.75) {$\dots$}; \node [style=none] (14) at (0, -1.25) {$\dots$}; \node [style=label] (16) at (0.5, 0.75) {$X_i$}; \node [style=none] (17) at (0, 1.5) {}; \node [style=whitedot] (18) at (0, 1.5) {}; \node [style=none] (19) at (-1, 2.25) {}; \node [style=none] (20) at (-1, 2.5) {}; \node [style=none] (22) at (0, 1.5) {}; \node [style=none] (23) at (1, 2.5) {}; \node [style=none] (24) at (1, 2.5) {}; \end{pgfonlayer} \begin{pgfonlayer}{edgelayer} \draw (5.center) to (6.center); \draw (8.center) to (7.center); \draw [bend right=45] (19.center) to (18); \draw (20.center) to (19.center); \draw [bend left=45] (23.center) to (22.center); \draw (0.center) to (22.center); \end{pgfonlayer} \end{tikzpicture} \end{equation*} In particular if a variable $X_i$ has no parents then the box $c_i$ has no inputs, and if $X_i$ has no children and is not an output then $c_i$ is proceeded by the discarding effect ${ {\space \begin{picc} \begin{aligned} \begin{tikzpicture}[font=\tiny,scale=1.0] \node[upgroundsmall, xscale=0.8, yscale=0.7] (1) at (0,0.16) {}; \draw (0,0.03) to (0,-0.25); \end{picc} \\@#STOP\hspace{-1pt}}}_{}\\@#STOP$. \item We then wire these pieces together by plugging an $X_i$ output from (the copy maps following) $c_i$ into the $X_i$ input of $c_j$, whenever there is an edge $X_i \to X_j$ in $G$. \end{enumerate} \end{definition} Since $G$ is acyclic, this produces a well-defined string diagram of the specified type. In particular by construction each vertex $X_i$ appears as an output only if $X_i \in O$, and in this case precisely once. In practice we allow any names to be given to the boxes $c_i$ in the diagram. \begin{example} \label{ex:DAG} For the DAG $G$ on the left below with outputs $O=\{X_2, X_3\}$ circled, the diagram $D_{G,O}$ is shown to the right. \[ \tikzfig{Graph-obs2} \qquad \mapsto \qquad \tikzfig{state-GO} \] \end{example} By construction, each string diagram $D_{G,O}$ contains only boxes with a single output, and copy maps. More precisely, we can identify the kinds of string diagrams that arise from DAGs in this way as follows. \begin{definition} \label{Def_NetworkDiagram} A \emph{network diagram} is a string diagram $D$ built from single-output boxes, copy maps and discarding effects: \[ \tikzfig{nd-box} \qquad \tikzfig{nd-copy} \qquad \tikzfig{nd-disc} \] along with labellings on the wires, such that any wires not connected by a sequence of copy maps are given distinct labels, and each label appears as an output at most once and as an input to any given box at most once. A \emph{state network diagram} is a network diagram with no inputs. \end{definition} \st{Causal models as defined so far will only require state network diagrams, but general network diagrams with inputs will be used in Sec.~\ref{sec:openCMs} and throughout Secs.~\ref{Sec_CE_Identifiability} and \ref{Sec_Counterfactuals}}. We consider two network diagrams to be equivalent when they are equal according to the general axioms of a cd-category, up to relabellings of the wires and boxes. \st{ \begin{remark} Network diagrams form a subset of the `gs-monoidal string diagrams' considered by Fritz and Liang \cite{FritzEtAl_2022_FreeGSMonoidalCategories}, which allow boxes with multiple outputs. The latter are not required here since each variable in a causal model is the unique output of its mechanism. \end{remark} } \begin{proposition} \label{prop:DagsNDs} Up to equivalence and relabellings of diagrams, the following are equivalent: \begin{itemize} \item a DAG $G=(V,E)$ with outputs $O \subseteq V$; \item a state network diagram \[ \tikzfig{stateNDO} \] with wires $V$ and outputs $O$; \end{itemize} via $(G,O) \mapsto D_{G,O}$ and $D \mapsto (G_D, O_D)$. \end{proposition} \begin{proof} Given $(G,O)$ we define $D_{G,O}$ as in Def.~\ref{def:DGO}. Conversely, given a state network diagram $D$, we define the DAG $G_D$ with a vertex $X$ for each output of a non-copy box in $D$, and set $X \in O_D$ iff it is an output of the diagram. We include a directed edge $X \to Y$ whenever $X$ is an input to the unique box with output $Y$. Up to relabellings, it is straightforward to see that these mappings are inverse to each other. \end{proof} The correspondence so far relates DAGs and state network diagrams, which are purely syntactic in nature, outlining only a causal structure. To provide an actual causal model, we must interpret them in a category, identifying the wires and boxes with specific objects and channels in the category. \begin{definition} An \emph{interpretation} $\sem{-}$ of a network diagram $D$ in a cd-category $\catC$ consists of specifying in $\catC$ an object $\sem{X_i}$ for each wire $X_i$ and channel $\sem{f} \colon \sem{X_1} \otimes \dots \otimes \sem{X_k} \to \sem{X}$ for each box $f$ in $D$ with inputs $X_1,\dots, X_k$ and output $X$. \end{definition} Thus an interpretation $\sem{-}$ of $D$ involves specifying objects for all wires and channels for all boxes in $D$. \st{For brevity, we at times also call a network diagram $D$ along with an interpretation $\sem{-}$ in $\catC$ a \emph{network diagram $D$ in $\catC$}}. \begin{theorem} \label{Thm:Equivalences_CausalModel_Defs} Let $\catC$ be a cd-category. Specifying a causal model $\modelM$ in $\catC$ is equivalent to specifying a state network diagram $D$ with an interpretation $\sem{-}$ in $\catC$. \end{theorem} \begin{proof} Thanks to Proposition \ref{prop:DagsNDs} a causal structure $(G,O)$ is equivalent to a state network diagram $D = D_{G,O}$. Interpreting the network diagram $D$ is then precisely the same as specifying a mechanism \eqref{eq:channels} for each variable, i.e. specifying a causal model with this causal structure. \end{proof} Thus, in short, a causal model $\modelM$ in $\catC$ is the same as a state network diagram in $\catC$. This allows one to specify a full causal model by simply drawing its network diagram, which encodes the causal structure $(G,O)$ and names the mechanisms, and then interpreting the mechanisms in $\catC$. \rlb{In practice and especially later in Sections~\ref{Sec_CE_Identifiability} and \ref{Sec_Counterfactuals},} given a model $\modelM$ we typically omit writing $\sem{-}$ and so write simply $X$, $f$ for both the labels and boxes in the network diagram and their corresponding interpretation in $\catC$, rather than denoting the latter by $\sem{X}, \sem{f}$. \st{ \paragraph{The output state defined by a model.} Given any network diagram $D$ with an interpretation $\sem{-}$ in $\catC$, we can compose the interpreted channels in the diagram according to $D$ to yield a single overall channel in $\catC$ from the inputs to the outputs of the diagram, which we denote by $\sem{D}$. In particular, for any causal model $\modelM$ with network diagram $D$ and outputs $O$ we obtain a normalised state in $\catC$ over the outputs $O$: \[ \tikzfig{output-state} \] which we call the \emph{output state} of the model. } \st{Conversely, in practice one often starts from a normalised state $\omega$ over a set of objects $O$, and asks whether $\omega$ is compatible with a causal structure given by some DAG $G$, or equivalently a network diagram. We say that $\omega$ \emph{factorises according to} a network diagram $D$ if there exists an interpretation $\sem{-}$ of $D$ with output objects $O$ the same as those of $\omega$ (i.e. a choice of objects for the remaining non-output wires in the diagram, and channels for all the boxes in the diagram) such that $\sem{D} = \omega$. Similarly we say that $\omega$ \emph{factorises according to} a DAG $G$ over $V$ with specified outputs $O \subseteq V$ if it factorises according to $D_{G,O}$.} \begin{example} An interpretation of a network diagram in $\FStoch$ consists of specifying finite sets of values for each label $X_i$, and a probability channel $c_i \colon Y_1 \times \dots \times Y_n \to X_i$ for each box with these inputs and output in the diagram. Hence, we can specify a CBN via a network diagram along with interpretations for each of its boxes as finite probability channels, or for short, a network diagram in $\FStoch$. \st{ Given such a CBN $\modelM$, the output state $\sem{\modelM}$ over $O$ in $\catC$ describes the resulting output distribution $P(O)$ over the output variables $O$.} \st{In general a distribution $\omega = P(O)$ factorises according to a network diagram $D=D_{G,O}$ \rlb{whenever $\omega$ satisfies the causal Markov condition for $G$ (see Ex.~\ref{ex:CBN}).}} \end{example} \begin{example} \label{Ex_CBN_ND} \rl{Recall the CBN from Ex.~\ref{Ex_CBN}. Its representation as a network diagram $D$ is given by the below diagram together with interpreting $c_L$, $c_S$, $c_B$ and $c_A$ as the channels $P(L | SBA)$, $P(S | B)$, $P(B | A)$ and $P(A)$, respectively.} \begin{equation} \label{eq:CBN-ndag-ex} \tikzfig{Fig_Example_CBN_DAG_ND} \end{equation} \st{Viewing the above diagram as a single state over $S, L , A$ yields precisely the output state of the model, i.e. the output distribution $P(S,L,A)$.} \end{example} \rl{It is worth pointing out that, given} a state network diagram $D=D_{G,O}$ corresponding to DAG $G$ with vertices $V$ and outputs $O$, we can always extend it to the diagram $D_G := D_{G,V}$ on the same DAG, but in which now every wire appears as an output. In fact $D_{G,O}$ can be obtained as a marginal of the latter: \[ \tikzfig{DGomarginal} \] and conversely we can obtain $D_G$ by copying each non-output wire an extra time: \[ \tikzfig{copies} \] \st{ \begin{example} For a CBN, the normalised state $\sem{D_G}$ over $V$ in $\catC$ describes the probability distribution $P(V)$ over the full set of variables $V$ of the model, rather than merely the outputs $O$. For the CBN with diagram $D$ in Example \eqref{eq:CBN-ndag-ex} with variables $V=\{A, B, S, L\}$ this is given by: \[ \tikzfig{CBN-nd-full} \] \end{example} } \begin{remark} \label{rem:JKZ_SynG_Perspective} As observed in \cite{JacobsEtAl_2019_CausalInferenceByDiagramSurgery}, an interpretation of a network diagram $D$ can be equivalently defined as a cd-functor \[ \Free(D) \to \catC \] where $\Free(D)$ is the \emph{free cd-category} generated by the labels and boxes in the diagram. Thus a causal model in $\catC$ over a DAG $G$ is determined by (its outputs $O$ and) a cd-functor $\Free(D_G) \to \catC$. This is the original perspective on causal models in \cite{Fong_2013_CausalTheories}, where Fong calls the category $\Free(D_{G})$ the \emph{causal theory} of $G$. \st{This perspective is compatible with our approach, but here we place less emphasis on the view of a model as a functor.} \end{remark} %************************************************* \subsection{Functional causal models \label{Sec_FunctionalCausalModel}} %************************************************* In the Pearlian causal model framework there is a further central notion of a causal model, in addition to that of a CBN, known as a \emph{Functional Causal Model} (FCM), or perhaps more commonly a \emph{Structural Causal Model} (SCM). This kind of model is considered to encode a more `refined' form of causal knowledge and constitutes the basis for more detailed causal explanations and answering different kinds of causal queries, most notably counterfactual ones -- it is the model that is associated with the third and highest level of the \emph{causal hierarchy} \cite{Pearl_Causality, BareinboimEtAl_2021_PearlsHierarchy}. In an SCM each mechanism is factored in terms of a deterministic functional component, representing the underlying causal process, and an additional noise variable with no parents, which encodes our uncertainty about the underlying state of the world. Conceptually, the notion of SCMs may be seen to underlie that of CBNs in that the former formalise causal reasoning on the basis of causal relations defined as functional dependencies, and then give rise to the latter after marginalising over the `local noise variables'. Formally, though, an SCM can be seen as a special case of a CBN over a larger set of variables, and the following presentation shall make use of this fact. In Definition \ref{def:deterministic} we saw that functions generalise to deterministic channels in a cd-category. This naturally yields the following definition, where in our category-theoretic context we prefer to use the term `functional causal model', saving the term `structural causal model' for their usual instantiation in $\FStoch$. \begin{definition} \label{def:FCM} Let $\catC$ be a cd-category. A \emph{Functional Causal Model (FCM)} in $\catC$ is a causal model $\model{M}$ whose variables are partitioned into $V = \{X_i\}^n_{i=1}$, called the \emph{endogenous} variables, and $U = \{U_i\}^n_{i=1}$, called the \emph{exogenous} variables, with output variables $O \subseteq V$ and such that: \begin{enumerate} \item each $U_i$ has no parents; \item each $X_i$ has a mechanism of the form \begin{equation} \label{eq:SCM-2} \tikzfig{SCM-2} \end{equation} where $\Pa'(X_i) \subseteq V$ and $f_i$ is a deterministic channel. \end{enumerate} \end{definition} That is, an FCM is a causal model in which every endogenous variable $X_i$ -- the modelled, or explained ones -- has a deterministic mechanism $f_i$ with one non-output parent $U_i$, and every such $U_i$ variable -- thought of as a `local disturbance' -- has no parents and only one child. Since each $U_i$ has no parents, its mechanism will, by the definition of a causal model, simply be a state \[ \tikzfig{noise} \] The usual kinds of FCMs, which are considered in the literature, are the following. \begin{example} A \emph{Structural Causal Model (SCM)} is an FCM in $\FStoch$. Hence, it is given by finite sets $X_1,\dots,X_n$, $U_1,\dots,U_n$, describing values of the endogenous and exogenous variables, respectively, along with a distribution $\lambda_i$ over each $U_i$ and for each $X_i$ a function \[ f_i \colon \Pa'(X_i) \times U_i \to X_i \] for a subset $\Pa'(X_i) \subseteq \{X_1,\dots, X_n\}$. \end{example} Typically an FCM with variables $V \cup U$ is considered to represent the deterministic mechanisms $f_i$ that underlie a (non-functional) causal model on $V=\{X_1, \dots, X_n\}$, where all randomness comes from our ignorance about the $U_i$, captured in the `noise' distributions $\lambda_i$. Given any FCM $\model{M}$ we can define a causal model $\model{M}|_{V}$ over just $V$ by assigning each $X_i$ the mechanism \begin{equation} \label{eq:deSCM} \tikzfig{de-SCM} \end{equation} Conversely, every causal model in $\FStoch$, i.e. every CBN, is in fact of this form for some SCM $\model{M}$, since we can always extend any probabilistic mechanism to a functional part along with an additional `noise' distribution, thanks to the following well-known fact. \begin{proposition} For every channel $c$ in $\Stoch$ there exists an object $U$ with a state $\lambda$ and a deterministic channel $f$ such that \begin{equation} \tikzfig{purif} \label{Eq_FunctionalDilation} \end{equation} \end{proposition} Crucially, the latter direction of decomposing a given mechanism $c$ in terms of suitable $f, U, \lambda$ is highly non-unique. Hence, there are a great many SCMs with variables $V \cup U$ that give rise to the \emph{same} CBN with variables $V$. \rlb{This fact essentially} is behind the separation between the second and third level of the causal hierarchy. The significance of FCMs to formalise counterfactuals will be discussed in Sec.~\ref{Sec_Counterfactuals}. \begin{example} \label{Ex_CBN_FCM_ND} An SCM with variables $V=\{B,S,L,A\}$ and $U=\{U_B, U_S,U_L,U_A\}$ and the same causal structure amongst $V$ as in Ex.~\ref{Ex_CBN_ND} is given by the below network diagram. The (interpretations of the) pairs $f_X, \lambda_X$ for $X \in V$ may be chosen such that Eq.~\eqref{eq:deSCM} holds with the left-hand side being the corresponding channel $c_X$ from our example CBN in Ex.~\ref{Ex_CBN_ND}, with these channels indicated by the grey boxes. \begin{equation} \tikzfig{Fig_Example_FCM_DAG_ND_boxed_up} \end{equation} \end{example} \rlb{Another significance of viewing causal relations as ultimately defined in terms of dependency relations relative to some, even if unknown, true function (deterministic morphism), is that} it justifies the otherwise ad-hoc stipulation in the definition of a causal model that each mechanism has only a single output. While the details of this argument can be read up elsewhere \rlb{-- see, e.g., Refs.~\cite{Pearl_Causality, SEP_ProbabilisticCausation, SEP_ReichenbachsPrincple, AllenEtAl_2016_QCM, Lorenz_2022_MeritsOfTheSpirit_Synthese, otsuka2022process} --} we note that this fact becomes visually manifest in the diagrammatic representation of FCMs. Given an FCM like Ex.~\ref{Ex_CBN_FCM_ND}, by regarding the pairs $f_X, \lambda_{X}$ for $X \in V$ as mechanisms $c_X$ according to Eq.~\eqref{eq:deSCM} we obtain a causal model in the sense of Def.~\ref{def:causal-model}. However, the difference between asserting the existence of mechanisms as single-output channels $c_X \colon \Pa(X) \rightarrow X$ versus single-output \emph{deterministic} channels $f_X \colon \Pa(X) \rightarrow X$ lies in their factorisation properties. Any function with multiple outputs, say $f \colon A \to B \times C$ factorises in terms of component functions $f_B, f_C$ with outputs $B, C$. More generally any deterministic morphism in a cd-category factorises in this way: \begin{equation} \label{Eq_FunctionsFactorise} \tikzfig{Fig_functions_factorise_b} \end{equation} The same property holds iteratively for more than two factors in the same way.\footnote{If the input has a product structure corresponding to distinct variables, one may then combine this factorisation property with rewriting the deterministic channels $f_B$ and $f_C$ according to their actual dependency structure so that $f_B$ and $f_C$ have only those wires as input on which $B$ and $C$ actually depend through $f_B$ and $f_C$, respectively.} Crucially, generic channels $c \colon A \to B \otimes C$ in a cd-category do \emph{not} factorise in this way, just like generic probabilistic channels in $\FStoch$ do not. Thus viewing causal models as simply defined via probability channels one may be tempted to allow for channels with multiple outputs, but when basing \rl{causal relations} ultimately on functional dependencies one arrives at the view that mechanisms should have single outputs only. \color{\rlbcolor} \subsection{Notions of faithfulness\label{subsec:discussion_faithfulness}} An important aspect of the causal model framework is the precise link it provides between causal assertions and conditional independence relations. A probability distribution has to respect all those conditional independences that are implied by a causal structure for it to be compatible with that causal structure. The link thus also is at the basis of algorithmic solutions to \emph{causal discovery} \cite{Pearl_Causality, SpirtesEtAL_2000_BookCausationPredictionSearch, PetersEtAl_2017_ElementsOfCausalInference} -- the epistemological problem of inferring (constraints on) the causal structure given a probability distribution. This link is formalised in the \emph{d-separation theorem} \cite{Verma&Pearl_1990_CausalNetworks, GeigerEtAl_1990_IdentifyingIndependence} in terms of the purely graphical notion of \emph{d-separation}. Given a DAG $G$, for disjoint subsets \rlb{of vertices $Y$, $Z$ and $W$, write $(Y \bigCI Z | W)_G$ if $Y$ and $Z$ are d-separated by $W$ (Def.~\ref{Def_DSeparation} in Appendix ~\ref{App_Proof_Thm_DSeparation_Theorem}). The following establishes that d-separation is in general sound for conditional independence in our sense, proven in Appendix~\ref{App_Proof_Thm_DSeparation_Theorem}.\footnote{Recently, Fritz and Klingler presented a generalised notion of d-separation in Markov categories in purely string diagrammatic terms \cite{FritzEtAl_2023_DSeparationInCatgeoricalProbabilty}. However, since this work uses a distinct notion of conditional independence (involving partial channels), we must independently verify soundness of d-separation.}} \begin{theorem} \label{Thm_DSeparation_Theorem} Let $\catC$ be a cd-category with \diagconditioning{} and $\omega$ a normalised state over $V$ which factorises according to a DAG $G$ with variables $V$. Then for any disjoint subsets $Y,Z,W \subset V$ we have \[ (Y \bigCI Z | W)_{G} \quad \Rightarrow \quad (Y \bigCI Z | W)_{\omega} \ . \] \end{theorem} Conversely, \rlb{we call} a causal model $\modelM$ in $\catC$ \emph{faithful} iff the only conditional independences that its induced state $\omega=\sem{\modelM}$ satisfies are those that are implied by the causal structure, that is: $(Y \bigCI Z | W)_{\omega} \implies (Y \bigCI Z | W)_{G_{\modelM}}$. The significance of notions of faithfulness generally is as Occham's razor type of desiderata in causal discovery \cite{Pearl_Causality}. Now since an actual causal model comes with the specification of causal mechanisms for each variable, we also obtain a second, logically distinct notion of faithfulness, which is most easily stated by first defining the following property of a channel. \begin{definition} \textnormal{(No-signalling):} \label{Def_NoSignalling} Given a channel $c : S \rightarrow X$ from inputs $S$ to $X$, say there is \emph{no signalling} from $T \subseteq S$ to $X$ iff there exists a channel $d$ such that \begin{equation} \tikzfig{nosignal} \label{eq:nosignal} \end{equation} \end{definition} The terminology refers to the fact that the above condition ensures that any input $\rho$ to the systems in $T$ does not lead to a change in $X$ as determined by $c$, that is: \begin{equation} \tikzfig{nosignal_ImplicationForStates} \label{eq:nosignal_ImplicationForStates} \end{equation} Given a causal model $\modelM$ with variables $V$ and DAG $G_{\modelM}$, let us call the mechanism $c_X \colon \Pa(X) \rightarrow X$ of $X \in V$ \emph{faithful} iff there does not exist any non-empty $T \subseteq \Pa(X)$ such that Eq.~\eqref{eq:nosignal} holds. We then call a causal model $\modelM$ \emph{mechanism faithful} if all of its mechanisms are faithful. Thus for each variable $X$ there is the potential to signal to $X$ from each of its parents. The following result helps us to connect \rlb{the discussed notions of faithfulness.} \begin{lemma} \label{lem:con-indep-mechs} Let $\modelM$ be a causal model whose induced state $\omega=\sem{\modelM}$ has full support over $\Pa(X)$, for some variable $X$. Then \begin{equation} \tikzfig{Fig_mechanism_and_conditional} \label{Eq_mechanism_and_conditional} \end{equation} \end{lemma} \begin{proof} Appendix~\ref{app:mechcon}. \end{proof} It follows from this result that if $\modelM$ is a causal model whose induced state has full support over all variables, \rlb{its mechanisms are uniquely determined (by conditioning) and faithfulness of $\modelM$ implies mechanism faithfulness.} In general, however, Eq.~\eqref{Eq_mechanism_and_conditional} fails and it is not hard to convince oneself that neither notion of faithfulness implies the other.\footnote{We note that Ref.~\cite{BarrettEtAl_2019_QCMs} introduces a further notion of conditional independence, therein referred to as strong relative independence. In light of the split-node models from App.~\ref{Sec:Split_node_models} strong relative independence naturally generalises from $\MatR$ in Ref.~\cite{BarrettEtAl_2019_QCMs} to causal models in cd-categories. This notion, logically stronger than $(Y \bigCI Z | W)_{\omega}$, also induces a corresponding notion of faithfulness in terms of d-separation relations. We leave exploring the relations between all three notions of faithfulness in more detail to future work.} One way to understand the conceptual significance of non-faithful causal models is in the context of how causal models relate to functional causal models. Suppose that we have a causal model with a mechanism $c_X$ ultimately deriving from a FCM via a functional dependency $f_X$ and extra `noise' state $\lambda_X$ as \rlb{in Eq.~\eqref{eq:deSCM}. Then it is perfectly conceivable} that there actually is causal influence through $f_X$ from $Y$ to $X$ for some $Y \in \Pa(X)$, albeit for some particular (fine-tuned) state $\lambda_{X}$ on $U_X$ this influence is `washed out', yielding an effective mechanisms $c_X$, which is \emph{not} faithful. For this reason, we must allow for causal models which are not mechanism faithful. \color{black} % \bibliographystyle{utphys} \bibliography{CauseComp.bib} % Uncomment while working on file standalone, if needed %\end{document} \input{4-interventions} \input{5-open-models} \input{6-Latent} \input{7-Identifiability} \input{8-Counterfactuals} \input{9-outlook} \section{Acknowledgements} \rlb{We would like to thank Bob Coecke, Steve Clark, Matty Hoban, Konstantinos Meichanetzidis, Ilya Shpitser, Robin Evans and Ciarán Gilligan-Lee for helpful discussions and feedback.} \addcontentsline{toc}{section}{References} %\bibliographystyle{utphys} %\bibliography{CauseComp.bib} \providecommand{\href}[2]{#2}\begingroup\raggedright\begin{thebibliography}{100} \bibitem{BeebeeEtAl_2009_OxfordHandbookOfCausation} H.~Beebee, C.~Hitchcock, and P.~Menzies, {\em The Oxford handbook of causation}. \newblock Oxford University Press, 2009. \bibitem{SEP_MetaphysicsOfCausation} J.~D. Gallow, ``{The Metaphysics of Causation},'' in {\em The {Stanford} Encyclopedia of Philosophy}, E.~N. Zalta and U.~Nodelman, eds. \newblock Metaphysics Research Lab, Stanford University, {F}all 2022~ed., 2022. \bibitem{SEP_ProbabilisticCausation} C.~Hitchcock, ``{Probabilistic Causation},'' in {\em The {Stanford} Encyclopedia of Philosophy}, E.~N. Zalta, ed. \newblock Metaphysics Research Lab, Stanford University, {S}pring 2021~ed., 2021. \bibitem{Pearl_Causality} J.~Pearl, {\em Causality}. \newblock Cambridge university press, 2009. \bibitem{SpirtesEtAL_2000_BookCausationPredictionSearch} P.~Spirtes, C.~Glymour, and R.~Scheines, {\em Causation, Prediction, and Search}. \newblock MIT press, 2nd~ed., 2000. \bibitem{CoeckeEtAl_2012_PicturingBayesianInference} B.~Coecke and R.~W. Spekkens, ``Picturing classical and quantum bayesian inference,'' {\em Synthese} {\bfseries 186} no.~3, (2012) 651--696. \bibitem{Fong_2013_CausalTheories} B.~Fong, ``Causal theories: A categorical perspective on bayesian networks,'' {\em arXiv preprint arXiv:1301.6201} (2013) . \bibitem{ChoEtAl_2019_DisintegrationViaStringDiagrams} K.~Cho and B.~Jacobs, ``Disintegration and bayesian inversion via string diagrams,'' {\em Mathematical Structures in Computer Science} {\bfseries 29} no.~7, (2019) 938--971. \bibitem{JacobsEtAl_2019_CausalInferenceByDiagramSurgery} B.~Jacobs, A.~Kissinger, and F.~Zanasi, ``Causal inference by string diagram surgery,'' in {\em International conference on foundations of software science and computation structures}, pp.~313--329, Springer. \newblock 2019. \bibitem{Fritz_2020_SyntheticApproachToMarkovKernels} T.~Fritz, ``A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics,'' {\em Advances in Mathematics} {\bfseries 370} (2020) 107239. \bibitem{JacobsEtAl_2021_CausalInferencesAsDiagramSurgery_DiagramsToCounterfactuals} B.~Jacobs, A.~Kissinger, and F.~Zanasi, ``Causal inference via string diagram surgery: A diagrammatic approach to interventions and counterfactuals,'' {\em Mathematical Structures in Computer Science} {\bfseries 31} no.~5, (2021) 553--574. \bibitem{FritzEtAl_2022_DSeparationInCategoricalProbability} T.~Fritz and A.~Klingler, ``The d-separation criterion in categorical probability,'' {\em arXiv preprint arXiv:2207.05740} (2022) . \bibitem{FritzEtAl_2023_DSeparationInCatgeoricalProbabilty} T.~Fritz and A.~Klingler, ``The d-separation criterion in categorical probability,'' {\em Journal of Machine Learning Research} {\bfseries 24} no.~46, (2023) 1--49. \bibitem{BareinboimEtAl_2021_PearlsHierarchy} E.~Bareinboim, J.~D. Correa, D.~Ibeling, and T.~F. Icard, ``On {P}earl's hierarchy and the foundations of causal inference,'' \newblock 2021. \bibitem{GlockerEtAl_2021_CausalityInDigitalMedicine} B.~Glocker, M.~Musolesi, J.~Richens, and C.~Uhler, ``Causality in digital medicine,'' {\em Nature Communications} {\bfseries 12} no.~1, (2021) . \bibitem{ReynaudEtAl_2022_DArtagnan_CounterfactualVideoGeneration} H.~Reynaud, A.~Vlontzos, M.~Dombrowski, C.~Lee, A.~Beqiri, P.~Leeson, and B.~Kainz, ``D'artagnan: Counterfactual video generation,'' {\em arXiv preprint arXiv:2206.01651} (2022) . \bibitem{VlontzosEtAl_2022_EstimatingCatgeoricalCFsVisDeepTwinNetworks} A.~Vlontzos, B.~Kainz, and C.~Lee, ``Estimating categorical counterfactuals via deep twin networks,''. \bibitem{LeeEtAl_2022_LeveragingDirectedCausalDiscovery} C.~M. Gilligan-Lee, C.~Hart, J.~Richens, and S.~Johri, ``Leveraging directed causal discovery to detect latent common causes in cause-effect pairs,'' \href{http://dx.doi.org/10.1109/TNNLS.2022.3205128}{{\em IEEE Transactions on Neural Networks and Learning Systems} (2022) 1--0}. \bibitem{SchoelkopfEtAl_2021_TowardCausalRepresentationLearning} B.~Schölkopf, F.~Locatello, S.~Bauer, N.~R. Ke, N.~Kalchbrenner, A.~Goyal, and Y.~Bengio, ``Toward causal representation learning,'' \href{http://dx.doi.org/10.1109/JPROC.2021.3058954}{{\em Proceedings of the IEEE} {\bfseries 109} no.~5, (2021) 612--634}. \bibitem{WolfeEtAl_2019_InflationTechniqueForCausalInference} E.~Wolfe, R.~W. Spekkens, and T.~Fritz, ``The inflation technique for causal inference with latent variables,'' {\em Journal of Causal Inference} {\bfseries 7} no.~2, (2019) . \bibitem{BoghiuEtAl_2022_InflationLibrary} E.-C. Boghiu, E.~Wolfe, and A.~Pozas-Kerstjens, ``Inflation: a python library for classical and quantum causal compatibility,'' {\em arXiv preprint arXiv:2211.04483} (2022) . \bibitem{mac2013categories} S.~Mac~Lane, {\em Categories for the working mathematician}, vol.~5. \newblock Springer Science \& Business Media, 2013. \bibitem{baez2011physics} J.~Baez and M.~Stay, {\em Physics, topology, logic and computation: a Rosetta Stone}. \newblock Springer, 2011. \bibitem{selinger2011survey} P.~Selinger, ``A survey of graphical languages for monoidal categories,'' {\em New structures for physics} (2011) 289--355. \bibitem{abramsky2004categorical} S.~Abramsky and B.~Coecke, ``A categorical semantics of quantum protocols,'' in {\em Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004.}, pp.~415--425, IEEE. \newblock 2004. \bibitem{coecke2018picturing} B.~Coecke and A.~Kissinger, ``Picturing quantum processes: A first course on quantum theory and diagrammatic reasoning,'' in {\em Diagrammatic Representation and Inference: 10th International Conference, Diagrams 2018, Edinburgh, UK, June 18-22, 2018, Proceedings 10}, pp.~28--31, Springer. \newblock 2018. \bibitem{coecke2011interacting} B.~Coecke and R.~Duncan, ``Interacting quantum observables: categorical algebra and diagrammatics,'' {\em New Journal of Physics} {\bfseries 13} no.~4, (2011) 043016. \bibitem{FongEtAl_2019_InvitationACT_SevenSketches} B.~Fong and D.~I. Spivak, {\em An invitation to applied category theory: seven sketches in compositionality}. \newblock Cambridge University Press, 2019. \bibitem{shiebler2021category} D.~Shiebler, B.~Gavranovi{\'c}, and P.~Wilson, ``Category theory in machine learning,'' {\em arXiv preprint arXiv:2106.07032} (2021) . \bibitem{ghani2018compositional} N.~Ghani, J.~Hedges, V.~Winschel, and P.~Zahn, ``Compositional game theory,'' in {\em Proceedings of the 33rd annual ACM/IEEE symposium on logic in computer science}, pp.~472--481. \newblock 2018. \bibitem{coecke2010mathematical} B.~Coecke, M.~Sadrzadeh, and S.~Clark, ``Mathematical {F}oundations for a {C}ompositional {D}istributional {M}odel of {M}eaning,'' {\em Linguistic Analysis} {\bfseries 36} (2010) 345--384. \bibitem{JacobsEtAl_2016_PredicateStateSemanticsBayesianLearning} B.~Jacobs and F.~Zanasi, ``A predicate/state transformer semantics for bayesian learning,'' {\em Electronic Notes in Theoretical Computer Science} {\bfseries 325} (2016) 185--200. \bibitem{JacobsEtAl_2017_FormalSemanticsOfInfluenceInBayesianReasoning} B.~Jacobs and F.~Zanasi, ``A formal semantics of influence in bayesian reasoning,''. \bibitem{JacobsEtAl_2019_LogicalEssentialsOfBayesianReasoning} B.~Jacobs and F.~Zanasi, ``The logical essentials of bayesian reasoning,'' {\em Foundations of Probabilistic Programming} (2019) 295--331. \bibitem{FritzEtAl_2022_FreeGSMonoidalCategories} T.~Fritz and W.~Liang, ``Free gs-monoidal categories and free markov categories,'' {\em arXiv preprint arXiv:2204.02284} (2022) . \bibitem{JacobsEtAl_2021_CausalInferenceByDiagramSurgery} B.~Jacobs, A.~Kissinger, and F.~Zanasi, ``Causal inference via string diagram surgery: A diagrammatic approach to interventions and counterfactuals,'' {\em Mathematical Structures in Computer Science} {\bfseries 31} no.~5, (2021) 553--574. \bibitem{CorreaEtAl_2020_CalculusForStochasticInterventions} J.~Correa and E.~Bareinboim, ``A calculus for stochastic interventions: Causal effect identification and surrogate experiments,'' in {\em Proceedings of the AAAI Conference on Artificial Intelligence}, vol.~34, pp.~10093--10100. \newblock 2020. \bibitem{RichardsonEtAl_2022_NestedMarkovForADMGs} T.~S. Richardson, R.~J. Evans, J.~M. Robins, and I.~Shpitser, ``Nested markov properties for acyclic directed mixed graphs,'' 2017. \bibitem{OpenGraph1} B.~Fong, ``Decorated cospans,'' {\em arXiv preprint arXiv:1502.00872} (2015) . \bibitem{OpenGraph2} J.~C. Baez and K.~Courser, ``Structured cospans,'' {\em arXiv preprint arXiv:1911.04630} (2019) . \bibitem{ShpitserEtAl_2008_CompleteIdentificationMethodCausalHierarchy} I.~Shpitser and J.~Pearl, ``Complete identification methods for the causal hierarchy,'' {\em Journal of Machine Learning Research} {\bfseries 9} (2008) 1941--1979. \bibitem{Pearl_2018_TheBookOfWhy} J.~Pearl and D.~Mackenzie, {\em The Book of Why}. \newblock Basic Books, New York, 2018. \bibitem{Coecke_2022_QuantumInPictures} B.~Coecke and S.~Gogioso, {\em Quantum In Pictures}. \newblock Quantinuum, 2022. \bibitem{Pearl_2018_TheoreticalImpedimentsToML} J.~Pearl, ``Theoretical impediments to machine learning with seven sparks from the causal revolution,'' {\em arXiv preprint arXiv:1801.04016} (2018) . \bibitem{SchoelkopfEtAl_2022_StatisticalToCausalLearning} B.~Sch{\"o}lkopf and J.~von K{\"u}gelgen, ``From statistical to causal learning,'' {\em arXiv preprint arXiv:2204.00607} (2022) . \bibitem{Schoelkopf_2019_CausalityForMachineLearning} B.~Sch{\"o}lkopf, ``Causality for machine learning,'' {\em arXiv preprint arXiv:1911.10500} (2019) . \bibitem{ParascandoloEtAl_2018_LearningIndependentCausalMechanisms} G.~Parascandolo, N.~Kilbertus, M.~Rojas-Carulla, and B.~Sch{\"o}lkopf, ``Learning independent causal mechanisms,'' in {\em International Conference on Machine Learning}, pp.~4036--4044, PMLR. \newblock 2018. \bibitem{BengioEtAl_2019_MetaTransferObjective} Y.~Bengio, T.~Deleu, N.~Rahaman, R.~Ke, S.~Lachapelle, O.~Bilaniuk, A.~Goyal, and C.~Pal, ``A meta-transfer objective for learning to disentangle causal mechanisms,'' {\em arXiv preprint arXiv:1901.10912} (2019) . \bibitem{DasguptaEtAl_2019_CausalReasoningFromMetaRL} I.~Dasgupta, J.~Wang, S.~Chiappa, J.~Mitrovic, P.~Ortega, D.~Raposo, E.~Hughes, P.~Battaglia, M.~Botvinick, and Z.~Kurth-Nelson, ``Causal reasoning from meta-reinforcement learning,'' {\em arXiv preprint arXiv:1901.08162} (2019) . \bibitem{ChalupkaEtAl_2014_VisualCausalFeatureLearning} K.~Chalupka, P.~Perona, and F.~Eberhardt, ``Visual causal feature learning,'' {\em arXiv preprint arXiv:1412.2309} (2014) . \bibitem{ChalupkaEtAl_2016_MultiLevelCauseEffectSystems} K.~Chalupka, F.~Eberhardt, and P.~Perona, ``Multi-level cause-effect systems,'' in {\em Artificial intelligence and statistics}, pp.~361--369, PMLR. \newblock 2016. \bibitem{ChalupkaEtAl_2016_UnsupervisedDiscoveryOfElNino} K.~Chalupka, T.~Bischoff, P.~Perona, and F.~Eberhardt, ``Unsupervised discovery of el nino using causal feature learning on microlevel climate data,'' {\em arXiv preprint arXiv:1605.09370} (2016) . \bibitem{Eberhardt_2016_GreenAndGrueCausalVariables} F.~Eberhardt, ``Green and grue causal variables,'' {\em Synthese} {\bfseries 193} no.~4, (2016) 1029--1046. \bibitem{ChalupkaEtAl_2017_CausalFeatureLearning} K.~Chalupka, F.~Eberhardt, and P.~Perona, ``Causal feature learning: an overview,'' {\em Behaviormetrika} {\bfseries 44} no.~1, (2017) 137--164. \bibitem{BengioEtAl_2017_IndependentlyControllableFeatures} E.~Bengio, V.~Thomas, J.~Pineau, D.~Precup, and Y.~Bengio, ``Independently controllable features,'' {\em arXiv preprint arXiv:1703.07718} (2017) . \bibitem{LocatelloEtAl_2019_ChallengingAssumptionsInUnSupervisedDisRep} F.~Locatello, S.~Bauer, M.~Lucic, G.~Raetsch, S.~Gelly, B.~Sch{\"o}lkopf, and O.~Bachem, ``Challenging common assumptions in the unsupervised learning of disentangled representations,'' in {\em international conference on machine learning}, pp.~4114--4124, PMLR. \newblock 2019. \bibitem{LeebEtAl_2020_StructureByArchitecture} F.~Leeb, G.~Lanzillotta, Y.~Annadani, M.~Besserve, S.~Bauer, and B.~Sch{\"o}lkopf, ``Structure by architecture: Disentangled representations without regularization,'' {\em arXiv preprint arXiv:2006.07796} (2020) . \bibitem{LocatelloEtAl_2020_WeaklySupervisedDisentanglement} F.~Locatello, B.~Poole, G.~R{\"a}tsch, B.~Sch{\"o}lkopf, O.~Bachem, and M.~Tschannen, ``Weakly-supervised disentanglement without compromises,'' in {\em International Conference on Machine Learning}, pp.~6348--6359, PMLR. \newblock 2020. \bibitem{MitrovicEtAl_2020_RepresentationLearningViaInvariantCausalMechanisms} J.~Mitrovic, B.~McWilliams, J.~Walker, L.~Buesing, and C.~Blundell, ``Representation learning via invariant causal mechanisms,'' {\em arXiv preprint arXiv:2010.07922} (2020) . \bibitem{KeEtAL_2020_AmortizedLearningOfNeuralCausalReps} N.~R. Ke, J.~Wang, J.~Mitrovic, M.~Szummer, D.~J. Rezende, {\em et~al.}, ``Amortized learning of neural causal representations,'' {\em arXiv preprint arXiv:2008.09301} (2020) . \bibitem{ShenEtAl_2021_DisentangledGenerativeCausalRepLearning} X.~Shen, F.~Liu, H.~Dong, Q.~Lian, Z.~Chen, and T.~Zhang, ``Disentangled generative causal representation learning,'' {\em arXiv preprint arXiv:2010.02637} (2020) . \bibitem{WangEtAl_2021_DesiderataRepLearning_CausalPrespective} Y.~Wang and M.~I. Jordan, ``Desiderata for representation learning: A causal perspective,'' {\em arXiv preprint arXiv:2109.03795} (2021) . \bibitem{TraubleEtAl_2021_DisentangledRepresentationsFromCorrelatedData} F.~Tr{\"a}uble, E.~Creager, N.~Kilbertus, F.~Locatello, A.~Dittadi, A.~Goyal, B.~Sch{\"o}lkopf, and S.~Bauer, ``On disentangled representations learned from correlated data,'' in {\em International Conference on Machine Learning}, pp.~10401--10412, PMLR. \newblock 2021. \bibitem{LippeEtAl_2022_Citris} P.~Lippe, S.~Magliacane, S.~L{\"o}we, Y.~M. Asano, T.~Cohen, and S.~Gavves, ``Citris: Causal identifiability from temporal intervened sequences,'' in {\em International Conference on Machine Learning}, pp.~13557--13603, PMLR. \newblock 2022. \bibitem{BrehmerEtAl_2022_WeaklySupervisedCRL} J.~Brehmer, P.~De~Haan, P.~Lippe, and T.~Cohen, ``Weakly supervised causal representation learning,'' {\em arXiv preprint arXiv:2203.16437} (2022) . \bibitem{ShieblerEtAl_2021_CategoryTheoryInML} D.~Shiebler, B.~Gavranovi{\'c}, and P.~Wilson, ``Category theory in machine learning,'' {\em arXiv preprint arXiv:2106.07032} (2021) . \bibitem{Website_CatsForAI} ``Categories for ai (lecture series).'' \newblock \url{https://cats.for.ai/}. \bibitem{Cohen_2022_TowardsGroundedTheoryOfCausationForAI} T.~Cohen, ``Towards a grounded theory of causation for embodied ai,'' {\em arXiv preprint arXiv:2206.13973} (2022) . \bibitem{RischelEtAl_2021_CompositionalAbstractionError_CategoryCausalModels} E.~F. Rischel and S.~Weichwald, ``Compositional abstraction error and a category of causal models,'' in {\em Uncertainty in Artificial Intelligence}, pp.~1013--1023, PMLR. \newblock 2021. \bibitem{JanzingEtAl_2022_PhenomenologicalCausality} D.~Janzing and S.~H.~G. Mejia, ``Phenomenological causality,'' {\em arXiv preprint arXiv:2211.09024} (2022) . \bibitem{SchmidEtAl_2020_UnscramblingOmletteOfCausationAndInference} D.~Schmid, J.~H. Selby, and R.~W. Spekkens, ``Unscrambling the omelette of causation and inference: The framework of causal-inferential theories,'' {\em arXiv preprint arXiv:2009.03297} (2020) . \bibitem{AllenEtAl_2016_QCM} J.-M.~A. Allen, J.~Barrett, D.~C. Horsman, C.~M. Lee, and R.~W. Spekkens, ``Quantum common causes and quantum causal models,'' \href{http://dx.doi.org/10.1103/PhysRevX.7.031021}{{\em Phys. Rev. X} {\bfseries 7} (Jul, 2017) 031021}. \url{https://link.aps.org/doi/10.1103/PhysRevX.7.031021}. \bibitem{CostaEtAl_2016_QuantumCausalModeling} F.~Costa and S.~Shrapnel, ``Quantum causal modelling,'' {\em New Journal of Physics} {\bfseries 18} no.~6, (2016) 063032. \bibitem{BarrettEtAl_2019_QCMs} J.~Barrett, R.~Lorenz, and O.~Oreshkov, ``Quantum causal models,'' {\em arXiv preprint arXiv:1906.10726} (2019) . \bibitem{BarrettEtAl_2021_CyclicQCMs} J.~Barrett, R.~Lorenz, and O.~Oreshkov, ``Cyclic quantum causal models,'' {\em Nature communications} {\bfseries 12} no.~1, (2021) 1--15. \bibitem{OrmrodEtAl_2022_CausalStructureWithSectorialConstraints} N.~Ormrod, A.~Vanrietvelde, and J.~Barrett, ``Causal structure in the presence of sectorial constraints, with application to the quantum switch,'' {\em arXiv preprint arXiv:2204.10273} (2022) . \bibitem{LorenzEtAl_2021_CausalAndCompositionalStructure} R.~Lorenz and J.~Barrett, ``Causal and compositional structure of unitary transformations,'' {\em Quantum} {\bfseries 5} (2021) 511. \bibitem{VanrietveldeEtAl_2021_RoutedQuantumCircuits} A.~Vanrietvelde, H.~Kristj{\'a}nsson, and J.~Barrett, ``Routed quantum circuits,'' {\em Quantum} {\bfseries 5} (2021) 503. \bibitem{TianEtAl_2002_GeneralIdentificationConditionForCausalEffects} J.~Tian and J.~Pearl, {\em A general identification condition for causal effects}. \newblock eScholarship, University of California, 2002. \bibitem{Pearl_2011_AlgorithmizationOfCounterfactuals} J.~Pearl, ``The algorithmization of counterfactuals,'' {\em Annals of Mathematics and Artificial Intelligence} {\bfseries 61} no.~1, (2011) 29--39. \bibitem{fritz2020synthetic} T.~Fritz, ``A synthetic approach to markov kernels, conditional independence and theorems on sufficient statistics,'' {\em Advances in Mathematics} {\bfseries 370} (2020) 107239. \bibitem{coecke2006introducing} B.~Coecke, ``Introducing categories to the practicing physicist,'' in {\em What is category theory}, pp.~45--74, Polimetrica Monza. \newblock 2006. \bibitem{SEP_ReichenbachsPrincple} C.~Hitchcock and M.~R{\'e}dei, ``Reichenbach's common cause principle,'' in {\em The Stanford Encyclopedia of Philosophy}, E.~N. Zalta, ed. \newblock Metaphysics Research Lab, Stanford University, spring 2020~ed., 2020. \bibitem{Lorenz_2022_MeritsOfTheSpirit_Synthese} R.~Lorenz, ``Quantum causal models: the merits of the spirit of reichenbach’s principle for understanding quantum causal structure,'' {\em Synthese} {\bfseries 200} no.~5, (2022) 424. \bibitem{otsuka2022process} J.~Otsuka and H.~Saigo, ``The process theory of causality: an overview,'' October, 2022. \newblock \url{http://philsci-archive.pitt.edu/21267/}. \bibitem{PetersEtAl_2017_ElementsOfCausalInference} J.~Peters, D.~Janzing, and B.~Sch{\"o}lkopf, {\em Elements of causal inference: foundations and learning algorithms}. \newblock The MIT Press, 2017. \bibitem{Verma&Pearl_1990_CausalNetworks} T.~VERMA and J.~PEARL, \href{http://dx.doi.org/https://doi.org/10.1016/B978-0-444-88650-7.50011-1}{``Causal networks: Semantics and expressiveness,''} in {\em Uncertainty in Artificial Intelligence}, R.~D. SHACHTER, T.~S. LEVITT, L.~N. KANAL, and J.~F. LEMMER, eds., vol.~9 of {\em Machine Intelligence and Pattern Recognition}, pp.~69 -- 76. \newblock North-Holland, 1990. \newblock \url{http://www.sciencedirect.com/science/article/pii/B9780444886507500111}. \bibitem{GeigerEtAl_1990_IdentifyingIndependence} D.~Geiger, T.~Verma, and J.~Pearl, ``Identifying independence in bayesian networks,'' \href{http://dx.doi.org/10.1002/net.3230200504}{{\em Networks} {\bfseries 20} no.~5, 507--534}, \href{http://arxiv.org/abs/https://onlinelibrary.wiley.com/doi/pdf/10.1002/net.3230200504}{{\ttfamily https://onlinelibrary.wiley.com/doi/pdf/10.1002/net.3230200504}}. \url{https://onlinelibrary.wiley.com/doi/abs/10.1002/net.3230200504}. \bibitem{janzing2022phenomenological} D.~Janzing and S.~H.~G. Mejia, ``Phenomenological causality,'' {\em arXiv preprint arXiv:2211.09024} (2022) . \bibitem{Evans_2018_MarginsOfDiscreteBN} R.~J. Evans, ``Margins of discrete bayesian networks,'' {\em The Annals of Statistics} {\bfseries 46} no.~6A, (2018) 2623--2656. \bibitem{Evans_2016_GraphsForMarginsOfBNs} R.~J. Evans, ``Graphs for margins of bayesian networks,'' {\em Scandinavian Journal of Statistics} {\bfseries 43} no.~3, (2016) 625--648. \bibitem{Byrne_2007_RationalImagination} R.~M. Byrne, {\em The rational imagination: How people create alternatives to reality}. \newblock MIT press, 2007. \bibitem{SEP_Counterfactuals} W.~Starr, ``{Counterfactuals},'' in {\em The {Stanford} Encyclopedia of Philosophy}, E.~N. Zalta and U.~Nodelman, eds. \newblock Metaphysics Research Lab, Stanford University, {W}inter 2022~ed., 2022. \bibitem{BalkeEtAl_1994_CounterfactualProbabilities} A.~Balke and J.~Pearl, ``Counterfactual probabilities: Computational methods, bounds and applications,'' in {\em Uncertainty Proceedings 1994}, pp.~46--54. \newblock Elsevier, 1994. \bibitem{HalpernEtAl_2005CausesAndExplanationsI} J.~Y. Halpern and J.~Pearl, ``Causes and explanations: A structural-model approach. part i: Causes,'' {\em The British journal for the philosophy of science} {\bfseries 56} no.~4, (2005) 843--887. \bibitem{HalpernEtAl_2005CausesAndExplanationsII} J.~Y. Halpern and J.~Pearl, ``Causes and explanations: A structural-model approach. part ii: Explanations,'' {\em The British journal for the philosophy of science} {\bfseries 56} no.~4, (2005) 889--911. \bibitem{AvinETAl_2005_IdentifiabilityPathSpecificEffects} C.~Avin, I.~Shpitser, and J.~Pearl, ``Identifiability of path-specific effects,''. \bibitem{Lewis_1973_Counterfactuals} D.~Lewis, {\em Counterfactuals}. \newblock Cambridge, MA, USA: Blackwell, 1973. \bibitem{Lewis_1973_Causation} D.~Lewis, ``Causation,'' \href{http://dx.doi.org/10.2307/2025310}{{\em Journal of Philosophy} {\bfseries 70} no.~17, (1973) 556--567}. \bibitem{jacobs2019mathematics} B.~Jacobs, ``The mathematics of changing one’s mind, via jeffrey’s or via pearl’s update rule,'' {\em Journal of Artificial Intelligence Research} {\bfseries 65} (2019) 783--806. \bibitem{jacobs2021learning} B.~Jacobs, ``Learning from what's right and learning from what's wrong,'' {\em arXiv preprint arXiv:2112.14045} (2021) . \bibitem{FriendEtAl_2022_IdentificationCausalInfluenceInQuantumprocesses} I.~Friend and A.~Kissinger, ``Identification of causal influences in quantum processes,''. QPL proceedings. \bibitem{CorreaEtAl_2021_NestedCounterfactuals} J.~Correa, S.~Lee, and E.~Bareinboim, ``Nested counterfactual identification from arbitrary surrogate experiments,'' in {\em Advances in Neural Information Processing Systems}, M.~Ranzato, A.~Beygelzimer, Y.~Dauphin, P.~Liang, and J.~W. Vaughan, eds., vol.~34, pp.~6856--6867. \newblock Curran Associates, Inc., 2021. \newblock \url{https://proceedings.neurips.cc/paper/2021/file/36bedb6eb7152f39b16328448942822b-Paper.pdf}. \bibitem{Pearl_2001_DirectAndIndirectEffects} J.~Pearl, ``Direct and indirect effects,'' in {\em Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence}, UAI'01, p.~411–420. \newblock Morgan Kaufmann Publishers Inc., 2001. \bibitem{Spirtes_1995_DirectedCyclicGraphsForFeedback} P.~Spirtes, ``Directed cyclic graphical representations of feedback models,'' in {\em Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence}, UAI'95, p.~491–498. \newblock Morgan Kaufmann Publishers Inc., 1995. \bibitem{Richardson_1997_CharacterizationMCCyclicGraphs} T.~Richardson, ``A characterization of markov equivalence for directed cyclic graphs,'' {\em International Journal of Approximate Reasoning} {\bfseries 17} no.~2-3, (1997) 107--162. \bibitem{ForreEtAl_2017_MarkovPropertiesCycles} P.~Forr{\'e} and J.~M. Mooij, ``Markov properties for graphical models with cycles and latent variables,'' {\em arXiv preprint arXiv:1710.08775} (2017) . \bibitem{ForreEtAl_2020_CausalCalculusWithCycles} P.~Forr{\'e} and J.~M. Mooij, ``Causal calculus in the presence of cycles, latent confounders and selection bias,'' in {\em Uncertainty in Artificial Intelligence}, pp.~71--80, PMLR. \newblock 2020. \bibitem{suresh2023semantics} A.~K. Suresh, M.~Frembs, and E.~G. Cavalcanti, ``A semantics for counterfactuals in quantum causal models,'' {\em arXiv preprint arXiv:2302.11783} (2023) . \bibitem{panangaden1998probabilistic} P.~Panangaden, ``Probabilistic relations,'' {\em School of Computer Science Research Reports-University of Birmingham CSR} (1998) 59--74. \bibitem{Pearl_1988_Probabilistic} J.~Pearl, {\em Probabilistic reasoning in intelligent systems: networks of plausible inference}. \newblock Morgan Kaufmann, 1988. \bibitem{Lauritzen_2011_DirectedMarkovProperties} S.~Lauritzen, ``Directed markov properties.'' University of oxford, lecture notes, 2011. \newblock \url{http://www.stats.ox.ac.uk/~steffen/teaching/gm11/dag.pdf}. \bibitem{ChiribellaEtAl_2009_TheoreticalFrameworkCombs} G.~Chiribella, G.~M. D'Ariano, and P.~Perinotti, ``Theoretical framework for quantum networks,'' \href{http://dx.doi.org/10.1103/PhysRevA.80.022339}{{\em Phys. Rev. A} {\bfseries 80} (Aug, 2009) 022339}. \url{https://link.aps.org/doi/10.1103/PhysRevA.80.022339}. \bibitem{oreshkov2012quantum} O.~Oreshkov, F.~Costa, and {\v{C}}.~Brukner, ``Quantum correlations with no causal order,'' {\em Nature communications} {\bfseries 3} no.~1, (2012) 1092. \bibitem{uijlen2019categorical} S.~Uijlen and A.~Kissinger, ``A categorical semantics for causal structure,'' {\em Logical Methods in Computer Science} {\bfseries 15} (2019) . \bibitem{bisio2019theoretical} A.~Bisio and P.~Perinotti, ``Theoretical framework for higher-order quantum theory,'' {\em Proceedings of the Royal Society A} {\bfseries 475} no.~2225, (2019) 20180706. \bibitem{wilson2021causality} M.~Wilson and G.~Chiribella, ``Causality in higher order process theories,'' {\em arXiv preprint arXiv:2107.14581} (2021) . \bibitem{wilson2022mathematical} M.~Wilson and G.~Chiribella, ``A mathematical framework for transformations of physical processes,'' {\em arXiv preprint arXiv:2204.04319} (2022) . \bibitem{WilsonEtAl_2022_QuantumSupermapsCharacterisedbyLocality} M.~Wilson, G.~Chiribella, and A.~Kissinger, ``Quantum supermaps are characterized by locality,'' {\em arXiv preprint arXiv:2205.09844} (2022) . \end{thebibliography}\endgroup \appendix \input{10-appendix} \end{document}\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP\\@#STOP \end{aligned} \end{picc}$ \end{enumerate} \end{definition} \end{aligned} \end{pic}$}$ \end{aligned} \end{picc}$