Table of Contents
Fetching ...

Distributionally Robust Causal Abstractions

Yorgos Felekis, Theodoros Damoulas, Paris Giampouras

TL;DR

This work tackles robust causal reasoning across abstraction levels by introducing $(\rho,\iota)$-abstractions and the DiRoCA framework, which learns CAs that remain interventionally consistent under distributional shifts. It casts CA learning as a distributionally robust optimization over a 2-Wasserstein ambiguity set, with theoretical concentration guarantees in Gaussian and empirical settings to guide robustness radii. The approach specializes to linear abstractions and utilizes abduction to recover exogenous environments, yielding Gaussian and empirical DiRoCA implementations that outperform baselines under shifts and misspecifications. Empirical results on SLC and LiLUCAS demonstrate improved generalization and resilience to environmental changes, demonstrating the practical impact of principled robustness in multi-scale causal modeling.

Abstract

Causal Abstraction (CA) theory provides a principled framework for relating causal models that describe the same system at different levels of granularity while ensuring interventional consistency between them. Recently, several approaches for learning CAs have been proposed, but all assume fixed and well-specified exogenous distributions, making them vulnerable to environmental shifts and misspecification. In this work, we address these limitations by introducing the first class of distributionally robust CAs and their associated learning algorithms. The latter cast robust causal abstraction learning as a constrained min-max optimization problem with Wasserstein ambiguity sets. We provide theoretical results, for both empirical and Gaussian environments, leading to principled selection of the level of robustness via the radius of these sets. Furthermore, we present empirical evidence across different problems and CA learning methods, demonstrating our framework's robustness not only to environmental shifts but also to structural model and intervention mapping misspecification.

Distributionally Robust Causal Abstractions

TL;DR

This work tackles robust causal reasoning across abstraction levels by introducing -abstractions and the DiRoCA framework, which learns CAs that remain interventionally consistent under distributional shifts. It casts CA learning as a distributionally robust optimization over a 2-Wasserstein ambiguity set, with theoretical concentration guarantees in Gaussian and empirical settings to guide robustness radii. The approach specializes to linear abstractions and utilizes abduction to recover exogenous environments, yielding Gaussian and empirical DiRoCA implementations that outperform baselines under shifts and misspecifications. Empirical results on SLC and LiLUCAS demonstrate improved generalization and resilience to environmental changes, demonstrating the practical impact of principled robustness in multi-scale causal modeling.

Abstract

Causal Abstraction (CA) theory provides a principled framework for relating causal models that describe the same system at different levels of granularity while ensuring interventional consistency between them. Recently, several approaches for learning CAs have been proposed, but all assume fixed and well-specified exogenous distributions, making them vulnerable to environmental shifts and misspecification. In this work, we address these limitations by introducing the first class of distributionally robust CAs and their associated learning algorithms. The latter cast robust causal abstraction learning as a constrained min-max optimization problem with Wasserstein ambiguity sets. We provide theoretical results, for both empirical and Gaussian environments, leading to principled selection of the level of robustness via the radius of these sets. Furthermore, we present empirical evidence across different problems and CA learning methods, demonstrating our framework's robustness not only to environmental shifts but also to structural model and intervention mapping misspecification.

Paper Structure

This paper contains 33 sections, 3 theorems, 70 equations, 19 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

Let $\rho^\ell \sim {\cal N}(\mu_\ell, \Sigma_\ell)$ and $\rho^h \sim {\cal N}(\mu_h, \Sigma_h)$, under Assumption 1, let $\widehat{\rho^\ell}$ and $\widehat{\rho^h}$ from $N_\ell$ and $N_h$ i.i.d. samples. Also, let $\boldsymbol{\rho} := \rho^\ell \otimes \rho^h$ and $\widehat{\boldsymbol{\rho}} :=

Figures (19)

  • Figure 1: Left: Exact CAs assume a single environment ${\cal P}_{0}(\boldsymbol{{\cal U}})$, robust CAs operate over a constrained subset of relevant environments ${\cal P}_{m}(\boldsymbol{{\cal U}})$, with $0 \leq m<\infty$, and uniform CAs across all possible environments ${\cal P}_{\infty}(\boldsymbol{{\cal U}})$. Right: CAL methods in relation to this structure. Our framework models environmental uncertainty using a Wasserstein ball ${\mathbb B}_{\epsilon}$, centered at the empirical joint environment $\widehat{{\cal P}_{0}(\boldsymbol{{\cal U}})}$. Prior methods correspond to $\epsilon = 0$, implicitly assuming a fixed environment.
  • Figure 2: Computation of the environment–intervention error. The joint environment $\boldsymbol{\rho} = {\color{blue}\rho^\ell} \otimes {\color{orange}\rho^h}$ captures the combined uncertainty from the low- and high-level SCMs. By pushing forward these components through the reduced forms $\mathbf{g}^\ell$ and $\mathbf{g}^h$ of the respective SCMs, we evaluate two interventional pathways: (a) apply an intervention $\iota$ to $\mathcal{M}^{\ell}$, then map the resulting distribution to the high-level space via $\tau_{\#}$: $\tau_{\#}(\mathbb{P}_{\mathcal{M}^{\ell}_{\iota}})$; and (b) first map $\mathcal{M}^{\ell}$ to $\mathcal{M}^{h}$ via $\tau_{\#}$, then apply the corresponding intervention $\omega(\iota)$: $\mathbb{P}_{\mathcal{M}^{h}_{\omega(\iota)}}$. The divergence $\mathcal{D}_{\mathcal{X}^{h}}$ between the resulting interventional distributions computes $e_\tau^{\boldsymbol{\rho}, \iota}$. Aggregating $e_\tau^{\boldsymbol{\rho}, \iota}$ over an $({\cal A}, \mathcal{I})$ abstraction context, recovers the $({\cal A}, \mathcal{I})$–abstraction error (Eq. \ref{['eq:totabst_error']}). If this is zero, the diagram commutes and $\tau$ defines a $\tau$–$0$–approximate abstraction.
  • Figure 3: Robustness to outlier fraction ($\alpha$) on the SLC (top) and LiLUCAS (bottom) experiments for the Gaussian (left) and Empirical (right) settings. The evaluation is performed at a fixed Gaussian noise intensity ($\tilde{\sigma}=5.0$ for SLC and $\tilde{\sigma}=10.0$ for LiLUCAS). DiRoCA, especially with tuned ambiguity radius, achieves consistently lower abstraction error as the proportion of outlier environments increases, while non-robust methods degrade.
  • Figure 4: Robustness to Gaussian noise intensity ($\tilde{\sigma}$) on the SLC (top) and LiLUCAS (bottom) experiments for the Gaussian (left) and Empirical (right) settings. The evaluation is performed at a fixed outlier fraction of $\alpha=1.0$ (fully noisy data). DiRoCA achieves consistently lower abstraction error as the proportion of noise intensity increases.
  • Figure 5: Construction of the joint ambiguity set ${\mathbb B}_{\epsilon, 2}(\widehat{\boldsymbol{\rho}})$ under the assumption of independently sampled environments. The process involves abduction of exogenous variables, nominal empirical distribution estimation, and selection of robustness radii.
  • ...and 14 more figures

Theorems & Definitions (15)

  • Definition 2.1: Structural Causal Model
  • Definition 2.2: rubenstein2017causal
  • Definition 3.1: $(\rho,\iota)$-Abstraction
  • Definition 3.2
  • Remark 1
  • Theorem 1: Gaussian $\boldsymbol{\rho}$-Concentration
  • Theorem 2: Empirical $\boldsymbol{\rho}$-Concentration
  • Remark 2
  • Proposition 1: Consistency of Metric-based Abstraction Errors
  • proof
  • ...and 5 more