CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

Amir Asiaee; Zhuohui J. Liang; Chao Yan

CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

Amir Asiaee, Zhuohui J. Liang, Chao Yan

TL;DR

CausalWrap is proposed, a model-agnostic wrapper that injects partial causal knowledge (PCK) -- trusted edges, forbidden edges, and qualitative/monotonic constraints -- into any pretrained base generator (GAN, VAE, or diffusion model), without requiring access to its internals.

Abstract

Tabular synthetic data generators are typically trained to match observational distributions, which can yield high conventional utility (e.g., column correlations, predictive accuracy) yet poor preservation of structural relations relevant to causal analysis and out-of-distribution (OOD) reasoning. When the downstream use of synthetic data involves causal reasoning -- estimating treatment effects, evaluating policies, or testing mediation pathways -- merely matching the observational distribution is insufficient: structural fidelity and treatment-mechanism preservation become essential. We propose CausalWrap (CW), a model-agnostic wrapper that injects partial causal knowledge (PCK) -- trusted edges, forbidden edges, and qualitative/monotonic constraints -- into any pretrained base generator (GAN, VAE, or diffusion model), without requiring access to its internals. CW learns a lightweight, differentiable post-hoc correction map applied to samples from the base generator, optimized with causal penalty terms under an augmented-Lagrangian schedule. We provide theoretical results connecting penalty-based optimization to constraint satisfaction and relating approximate factorization to joint distributional control. We validate CW on simulated structural causal models (SCMs) with known ground-truth interventions, semi-synthetic causal benchmarks (IHDP and an ACIC-style suite), and a real-world ICU cohort (MIMIC-IV) with expert-elicited partial graphs. CW improves causal fidelity across diverse base generators -- e.g., reducing average treatment effect (ATE) error by up to 63% on ACIC and lifting ATE agreement from 0.00 to 0.38 on the intensive care unit (ICU) cohort -- while largely retaining conventional utility.

CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

TL;DR

Abstract

Paper Structure (67 sections, 2 theorems, 8 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 67 sections, 2 theorems, 8 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Contributions.
Related Work
Tabular Synthetic Data Generators
Medical and EHR Synthetic Data
Causal Models and Causal Generative Modeling
Treatment-Effect--Oriented Synthetic Data and Evaluation
Constrained Deep Generative Models
Causal Discovery, Independence Testing, and Constrained Optimization
Problem Setup
Data and Base Generator
Partial Causal Knowledge
Utility and Constraint Functionals
Constraint Functionals.
(1) Residualized independence constraints.
...and 52 more sections

Key Result

Theorem 1

Under Assumption assump:regularity, let $\hat{\phi}_\lambda$ be any minimizer of the penalized population objective $\mathcal{U}(P,Q_\phi)+\lambda\Omega_{\mathcal{K}}(Q_\phi)$. As $\lambda\to\infty$, every limit point $\phi^\star$ of $\{\hat{\phi}_\lambda\}$ is feasible and solves the constrained pr

Figures (10)

Figure 1: Partial causal knowledge $\mathcal{K}=(E^+,E^0,\mathcal{M})$. Green solid: trusted edges ($E^+$). Red dashed: forbidden edges ($E^0$). Gray dotted: unknown status.
Figure 2: Gradient flow through the CI penalty. Forward (black): raw samples pass through $f_\phi$; frozen edge models $\hat{m}$ produce residuals; HSIC on residual pairs gives the penalty. Backward (red): gradients flow back through $f_\phi$ only (edge models are frozen).
Figure 3: CausalWrap pipeline. A pretrained base generator (frozen) produces raw samples; a correction map $f_\phi$ produces corrected samples, which---together with real data and partial knowledge $\mathcal{K}$---feed the loss. Black: forward data flow; red: gradient (only $\phi$ is updated).
Figure 4: Tier 2: IHDP benchmark (10 replications, 5 seeds; log-scale violin plots). CW reduces mean ATE error for CTGAN and TabDDPM but can increase error for TVAE, and may increase tail variance on some replications.
Figure 5: Tier 2: ACIC-style ATE error across 10 DGP settings (2 seeds). CW improves all three bases on most settings (largest gains for TVAE), illustrating base-model sensitivity of post-hoc correction.
...and 5 more figures

Theorems & Definitions (7)

Example 1: ICU partial graph
Theorem 1: Penalty convergence
Remark 1: ALM convergence rate
Proposition 1: Chain-rule total variation (TV) bound for approximate conditionals
Remark 2: Design rationale and gap
proof : Proof of Theorem \ref{['thm:penalty']}
proof : Proof of Proposition \ref{['prop:chain_tv']}

CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

TL;DR

Abstract

CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)