Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models

Zeyu Zhou; Ruqi Bai; Sean Kulinski; Murat Kocaoglu; David I. Inouye

Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models

Zeyu Zhou, Ruqi Bai, Sean Kulinski, Murat Kocaoglu, David I. Inouye

TL;DR

It is proved domain counterfactual estimation error can be bounded by a data fit term and intervention sparsity term and a theoretically grounded practical algorithm is developed that simplifies the modeling process to generative model estimation under autoregressive and shared parameter constraints that enforce intervention sparsity.

Abstract

Answering counterfactual queries has important applications such as explainability, robustness, and fairness but is challenging when the causal variables are unobserved and the observations are non-linear mixtures of these latent variables, such as pixels in images. One approach is to recover the latent Structural Causal Model (SCM), which may be infeasible in practice due to requiring strong assumptions, e.g., linearity of the causal mechanisms or perfect atomic interventions. Meanwhile, more practical ML-based approaches using naive domain translation models to generate counterfactual samples lack theoretical grounding and may construct invalid counterfactuals. In this work, we strive to strike a balance between practicality and theoretical guarantees by analyzing a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain (or environment). We show that recovering the latent SCM is unnecessary for estimating domain counterfactuals, thereby sidestepping some of the theoretic challenges. By assuming invertibility and sparsity of intervention, we prove domain counterfactual estimation error can be bounded by a data fit term and intervention sparsity term. Building upon our theoretical results, we develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation under autoregressive and shared parameter constraints that enforce intervention sparsity. Finally, we show an improvement in counterfactual estimation over baseline methods through extensive simulated and image-based experiments.

Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models

TL;DR

Abstract

Paper Structure (59 sections, 22 theorems, 114 equations, 28 figures, 9 tables)

This paper contains 59 sections, 22 theorems, 114 equations, 28 figures, 9 tables.

Introduction
Notation
Domain Counterfactuals with Invertible Latent Domain Causal Models
ILD Model
ILD Domain Counterfactuals
Estimating ILD Domain Counterfactuals in Practice
Domain Counterfactual Error Bound
Canonical ILD Model
Proposed ILD Estimation Algorithm
Related Work
Causal Representation Learning
Counterfactual Generation
Experiments
Simulated Dataset
Image-based Counterfactual Experiments
...and 44 more sections

Key Result

Theorem 1

Two ILDs are domain counterfactually equivalent, i.e., $(g, \mathcal{F}) \simeq_C (g', \mathcal{F}')$ if and only if: and moreover, counterfactually equivalent models share the same intervention set size, i.e., if $(g, \mathcal{F}) \simeq_C (g', \mathcal{F}')$, then $|\mathcal{I}(\mathcal{F})| = |\mathcal{I}(\mathcal{F}')|$.

Figures (28)

Figure 1: Simulated experiment results (${N_d}=3$) averaged over 10 runs with different ground truth SCMs and the error bar represents the standard error. (a) This shows ILD-Can is consistently better than ILD-Dense regardless of intervened nodes in the dataset. (b) Here we test varying $|\mathcal{I}|$ while holding $\mathcal{I}^*$ fixed. The performance of ILD-Can approaches to that of ILD-Dense as we increase $|\mathcal{I}|$. An unexpected result is that ILD-Can performs best when $|\mathcal{I}|=1$ and that results from a worse data fitting which is more carefully investigated in \ref{['app-sec:simulated-results']}.
Figure 2: Domain counterfactuals with 3D Shapes and CausalIdent. Expanded figures can be found in \ref{['app-sec:image-addition-exp']} (a) For 3D Shapes, only the object shape should change with domain counterfactuals -- the other latent factors such as the hue of object, floor, background, should not change. (b) For CausalIdent, as the domain changes, the color of the background should change while holding all else unchanged. ILD-Can clearly performs better than the baseline ILD-Dense in terms of preserving non-domain features while changing domains for all datasets.
Figure 3: An illustration of the matrices/vector used to create $f_d$ across the three ILD models when $m=6$ and $|\mathcal{I}|=2$. These are used such that $f_d (\bm{\epsilon}) =F_d \;\bm{\epsilon} + \bm{b}_d$ where $F_d = (I - L_d)^{-1} S_d$. The grey elements are 0, the orange elements are parameters that are different for different domains, and the blue elements are parameters shared across domains. We specify the value if it is a fixed number other than 0. Note that we don't implement ILD-Identity-Can in our experiments. We include it here only for illustration of our theory.
Figure 4: Case 0: Test counterfactual error with different $\mathcal{I}^*$. To understand how the true intervention set affects the gap between ILD-Dense and ILD-Can, we varied the size of the ground truth intervention. It can be observed that the performance gap tends to be largest when the true intervention set is the sparsest and the performance of ILD-Can approaches to the performance of ILD-Dense as we increase the size.
Figure 5: Case 0: Test counterfactual error with different dimension. We investigate how our algorithm scales with dimension. We observe that ILD-Can is significantly better than ILD-Dense in 9 out of 12 cases, and we also notice that there 3 cases where their performance is close to that of each other. Here the intervention set contains the last two nodes. For example, when $m=4$, $\mathcal{I}=\{3,4\}$, and when $m=10$, $\mathcal{I}=\{9,10\}$.
...and 23 more figures

Theorems & Definitions (51)

Definition 1: Invertible Latent Domain Causal Model
Definition 2: Distribution Equivalence
Definition 3: Domain Counterfactual Equivalence
Theorem 1: Characterization of Counterfactual Equivalence
Definition 4: Counterfactual Pseudo-Metric for ILD Models
Theorem 2: Counterfactual Error Bound Decomposition
Definition 5: Canonical Domain Counterfactual Model
Theorem 3: Existence of Equivalent Canonical ILD
Definition 6: Structural Causal Model
Definition 7: Invertible SCM
...and 41 more

Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models

TL;DR

Abstract

Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (28)

Theorems & Definitions (51)