Table of Contents
Fetching ...

VAINE: Visualization and AI for Natural Experiments

Grace Guo, Maria Glenski, ZhuanYi Shaw, Emily Saldanha, Alex Endert, Svitlana Volkova, Dustin Arendt

TL;DR

The paper tackles causal inference from observational data by leveraging natural experiments identified within real-world data. It introduces VAINE, a visual analytics system with three coordinated views that cluster by covariates to control confounding and estimate an average treatment effect via a cluster-weighted approach, $ATE = (1/N) * sum_{i=1}^{M} n_i b_i$. Through two usage scenarios on Auto MPG and Ames Housing, VAINE demonstrates how interactive clustering, outlier handling, and covariate inspection reveal subgroup heterogeneity and phenomena like Simpson’s paradox, enhancing human-in-the-loop causal reasoning. The work contributes a domain-agnostic tool that supports validation of causal claims and practical estimation of $ATE$ in observational settings, with implications for fields from economics to policy and model evaluation; it also outlines avenues to address nonlinearity, time-ordering, and scalability in future work.

Abstract

Natural experiments are observational studies where the assignment of treatment conditions to different populations occurs by chance "in the wild". Researchers from fields such as economics, healthcare, and the social sciences leverage natural experiments to conduct hypothesis testing and causal effect estimation for treatment and outcome variables that would otherwise be costly, infeasible, or unethical. In this paper, we introduce VAINE (Visualization and AI for Natural Experiments), a visual analytics tool for identifying and understanding natural experiments from observational data. We then demonstrate how VAINE can be used to validate causal relationships, estimate average treatment effects, and identify statistical phenomena such as Simpson's paradox through two usage scenarios.

VAINE: Visualization and AI for Natural Experiments

TL;DR

The paper tackles causal inference from observational data by leveraging natural experiments identified within real-world data. It introduces VAINE, a visual analytics system with three coordinated views that cluster by covariates to control confounding and estimate an average treatment effect via a cluster-weighted approach, . Through two usage scenarios on Auto MPG and Ames Housing, VAINE demonstrates how interactive clustering, outlier handling, and covariate inspection reveal subgroup heterogeneity and phenomena like Simpson’s paradox, enhancing human-in-the-loop causal reasoning. The work contributes a domain-agnostic tool that supports validation of causal claims and practical estimation of in observational settings, with implications for fields from economics to policy and model evaluation; it also outlines avenues to address nonlinearity, time-ordering, and scalability in future work.

Abstract

Natural experiments are observational studies where the assignment of treatment conditions to different populations occurs by chance "in the wild". Researchers from fields such as economics, healthcare, and the social sciences leverage natural experiments to conduct hypothesis testing and causal effect estimation for treatment and outcome variables that would otherwise be costly, infeasible, or unethical. In this paper, we introduce VAINE (Visualization and AI for Natural Experiments), a visual analytics tool for identifying and understanding natural experiments from observational data. We then demonstrate how VAINE can be used to validate causal relationships, estimate average treatment effects, and identify statistical phenomena such as Simpson's paradox through two usage scenarios.

Paper Structure

This paper contains 11 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: After setting the number of clusters for a treatment and outcome pair, the clustering can be verified in the Covariates view.
  • Figure 2: Outliers identified in the Average Treatment Effect view can be excluded from analysis, updating the axes and $ATE$ value.
  • Figure 3: By inspecting the purple and lime clusters in greater detail, we can conclude that Lot Area has a stronger effect on Sale Price for smaller properties without a second floor.