Table of Contents
Fetching ...

Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability

Patrick Rehill, Nicholas Biddle

TL;DR

This paper examines transparency challenges when applying causal machine learning to public policy evaluation, distinguishing accountability (explanation and responsibility to the public) from usability (understanding data-generating processes and model workings). It centers on causal forests for heterogeneous treatment effect estimation, exploring explainable AI (SHAP, variable importance) and interpretable AI (single-tree distillation, best linear projection) as tools to illuminate complex models. Through a Queensland case study on returns to education in Australia, it shows that standard predictive-ML transparency tools have limited applicability to causal settings and that forcing interpretability via simplification can substantially increase error, underscoring a need for new causal-specific transparency instruments. The authors advocate careful, additive use of causal ML in policy, accompanied by governance-friendly transparency practices and diagnostics (refutation tests) to balance usability, accountability, and policy impact.

Abstract

Causal machine learning tools are beginning to see use in real-world policy evaluation tasks to flexibly estimate treatment effects. One issue with these methods is that the machine learning models used are generally black boxes, i.e., there is no globally interpretable way to understand how a model makes estimates. This is a clear problem in policy evaluation applications, particularly in government, because it is difficult to understand whether such models are functioning in ways that are fair, based on the correct interpretation of evidence and transparent enough to allow for accountability if things go wrong. However, there has been little discussion of transparency problems in the causal machine learning literature and how these might be overcome. This paper explores why transparency issues are a problem for causal machine learning in public policy evaluation applications and considers ways these problems might be addressed through explainable AI tools and by simplifying models in line with interpretable AI principles. It then applies these ideas to a case-study using a causal forest model to estimate conditional average treatment effects for a hypothetical change in the school leaving age in Australia. It shows that existing tools for understanding black-box predictive models are poorly suited to causal machine learning and that simplifying the model to make it interpretable leads to an unacceptable increase in error (in this application). It concludes that new tools are needed to properly understand causal machine learning models and the algorithms that fit them.

Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability

TL;DR

This paper examines transparency challenges when applying causal machine learning to public policy evaluation, distinguishing accountability (explanation and responsibility to the public) from usability (understanding data-generating processes and model workings). It centers on causal forests for heterogeneous treatment effect estimation, exploring explainable AI (SHAP, variable importance) and interpretable AI (single-tree distillation, best linear projection) as tools to illuminate complex models. Through a Queensland case study on returns to education in Australia, it shows that standard predictive-ML transparency tools have limited applicability to causal settings and that forcing interpretability via simplification can substantially increase error, underscoring a need for new causal-specific transparency instruments. The authors advocate careful, additive use of causal ML in policy, accompanied by governance-friendly transparency practices and diagnostics (refutation tests) to balance usability, accountability, and policy impact.

Abstract

Causal machine learning tools are beginning to see use in real-world policy evaluation tasks to flexibly estimate treatment effects. One issue with these methods is that the machine learning models used are generally black boxes, i.e., there is no globally interpretable way to understand how a model makes estimates. This is a clear problem in policy evaluation applications, particularly in government, because it is difficult to understand whether such models are functioning in ways that are fair, based on the correct interpretation of evidence and transparent enough to allow for accountability if things go wrong. However, there has been little discussion of transparency problems in the causal machine learning literature and how these might be overcome. This paper explores why transparency issues are a problem for causal machine learning in public policy evaluation applications and considers ways these problems might be addressed through explainable AI tools and by simplifying models in line with interpretable AI principles. It then applies these ideas to a case-study using a causal forest model to estimate conditional average treatment effects for a hypothetical change in the school leaving age in Australia. It shows that existing tools for understanding black-box predictive models are poorly suited to causal machine learning and that simplifying the model to make it interpretable leads to an unacceptable increase in error (in this application). It concludes that new tools are needed to properly understand causal machine learning models and the algorithms that fit them.
Paper Structure (23 sections, 4 equations, 8 figures, 4 tables)

This paper contains 23 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Aggregated SHAP plot explaining the HTE estimate across the distribution
  • Figure 2: Waterfall plot explaining the HTE estimate for a random individual
  • Figure 3: Rashomon curve for the effect of heterogeneity estimating model size (number of trees) on estimates (note, top 2.5% of values have been trimmed away due to a few very high error predictions making visualisation difficult).
  • Figure 4: Effect of refutation tests on estimated treatment effects (treatment effects should be near zero, conditional averages are averages of doubly robust scores, not the individual estimates shown as points).
  • Figure 5: Waterfall plot of SHAP values for random respondent 2
  • ...and 3 more figures