VAINE: Visualization and AI for Natural Experiments
Grace Guo, Maria Glenski, ZhuanYi Shaw, Emily Saldanha, Alex Endert, Svitlana Volkova, Dustin Arendt
TL;DR
The paper tackles causal inference from observational data by leveraging natural experiments identified within real-world data. It introduces VAINE, a visual analytics system with three coordinated views that cluster by covariates to control confounding and estimate an average treatment effect via a cluster-weighted approach, $ATE = (1/N) * sum_{i=1}^{M} n_i b_i$. Through two usage scenarios on Auto MPG and Ames Housing, VAINE demonstrates how interactive clustering, outlier handling, and covariate inspection reveal subgroup heterogeneity and phenomena like Simpson’s paradox, enhancing human-in-the-loop causal reasoning. The work contributes a domain-agnostic tool that supports validation of causal claims and practical estimation of $ATE$ in observational settings, with implications for fields from economics to policy and model evaluation; it also outlines avenues to address nonlinearity, time-ordering, and scalability in future work.
Abstract
Natural experiments are observational studies where the assignment of treatment conditions to different populations occurs by chance "in the wild". Researchers from fields such as economics, healthcare, and the social sciences leverage natural experiments to conduct hypothesis testing and causal effect estimation for treatment and outcome variables that would otherwise be costly, infeasible, or unethical. In this paper, we introduce VAINE (Visualization and AI for Natural Experiments), a visual analytics tool for identifying and understanding natural experiments from observational data. We then demonstrate how VAINE can be used to validate causal relationships, estimate average treatment effects, and identify statistical phenomena such as Simpson's paradox through two usage scenarios.
