DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Yinjun Wu; Mayank Keoliya; Kan Chen; Neelay Velingker; Ziyang Li; Emily J Getzen; Qi Long; Mayur Naik; Ravi B Parikh; Eric Wong

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong

TL;DR

DISCRET tackles faithful yet accurate ITE estimation by automatically synthesizing per-sample rule-based explanations that also serve to retrieve similar subgroups for local treatment effect estimation. It leverages a tailored deep Q-learning framework to produce conjunctive-disjunctive rules and guarantees local satisfiability for each target instance, yielding near‑perfect faithfulness (consistency) while maintaining accuracy on par with black-box models. The framework can regularize strong neural predictors by aligning their outputs with DISCRET, improving performance across tabular, image, and text domains, and it demonstrates substantial empirical gains over self‑interpretable baselines. The work contributes a novel combination of rule synthesis, causal ITE theory, and RL-based training, with practical implications for trustworthy treatment-effect decision-making in diverse data modalities.

Abstract

Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

TL;DR

Abstract

Paper Structure (48 sections, 4 theorems, 19 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 48 sections, 4 theorems, 19 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Preliminaries
Individual Treatment Effect (ITE) Estimation
Syntax of Logic Rules
The DISCRET Framework
Explanation Synthesis
Overview
Desired Properties of Explanations
Rule Generation
Explanation Evaluation
RL-based Training
Regularizing Black-box Models with DISCRET
Experiments
Setup
RQ1: Faithfulness Evaluation on Explanations
...and 33 more sections

Key Result

Theorem 3.1

Suppose we have input data $\{(x_i, t_i, s_i, y_i)\}_{i=1}^N$ where $x_i \in \mathbb{R}^m$ and discrete, $t_i \in \mathbb{R}, s_i \in \mathbb{R}$, and $y_i \in \mathbb{R}$, then the $\hat{ITE}_x$ obtained from DISCRET converges to zero generalization error with probability 1 for ITE estimation (i.e.

Figures (7)

Figure 1: Motivating examples from the Uganda dataset. We predict how providing economic aid (the treatment) helps to develop remote regions of the country (the outcome) via satellite images. The task is to estimate the ITE for each sample $x_1$ and $x_2$. DISCRET predicts that, because both images have several indicators of rich soil and urbanization, they will have similar ITE if given aid. Self-interpretable models such as Causal Forest athey2019estimating produce consistent ITE estimates (i.e., samples with same explanations have same model predictions, viz. 3.97 and 3.97), but have poor accuracy ($\hat{ITE_{x_1}} \ll ITE_{x_1} = 4.25$). Black-box models such as TransTEE zhang2022exploring, are accurate but do not produce similar predictions for samples $x_1$ and $x_2$ with similar explanations, when the explanations are sourced from post-hoc explainers such as Anchor ribeiro2018anchors. DISCRET produces both consistent and accurate predictions.
Figure 2: Illustration of DISCRET on the IHDP dataset, which tracks premature infants. Given a sample $x$, DISCRET synthesizing an explanation $L_{1:k}$ where it iteratively constructs each literal in the explanation. In particular, DISCRET (i) embeds the given sample and any previously generated literals ($\Theta_0$), (ii) passes the embedding to the feature selection network ($\Theta_1$) to pick a feature, and then (iii) passes the embedding and selected feature to the constant selection network ($\Theta_2$) to get a thresholding constant. The operator is auto-assigned based on the feature and sample. DISCRET executes this explanation on the database to find relevant samples, which are used (i) during training to compute a reward function for $\Theta_0, \Theta_1$ and $\Theta_2$, and (ii) during testing to calculate the ITE.
Figure 3: Consistency scores (higher is better) for DISCRET and a black-box model (TransTEE) combined with a post-hoc explainer. Our results confirm that DISCRET produces faithful explanations, and importantly, show that post-hoc explanations are rarely faithful, as evidenced by low consistency scores across datasets.
Figure 4: DISCRET identifies similar samples across diverse datasets -- tabular (IHDP), image (Uganda), and text (EEEC). 1) In the first setting, given a tabular sample $x$ describing a premature infant, DISCRET establishes a rule associating extremely underweight ($\mathtt{weight} \le 1.5$) infants born to teenage mothers ($\mathtt{mom}\: \mathtt{age} \le 19$) with a history of drug use; such groups likely benefit from childcare visits (treatment), and will have highly improved cognitive outcomes. 2) In the second scenario on satellite images, for a sample $x$, DISCRET discerns a rule based on the presence of concepts like "high soil moisture" (reddish-pink pixels) and absence of minimal soil (brown pixels); thus characterizing areas with high soil moisture. DISCRET's synthesized rule aligns with findings that government grants (treatment) are more effective in areas with higher soil moisture content (outcome) JJD-Heterogeneity. 3) Likewise, the text setting aims to measure the impact of gender (treatment) on the mood (outcome). Given a sentence $x$ where the gendered noun ("Betsy") does not affect the semantic meaning, DISCRET's rule focuses on mood-linked words in the sentence, i.e., "hilarious".
Figure 5: The curve of ATE errors on test split of IHDP by DISCRET
...and 2 more figures

Theorems & Definitions (7)

Theorem 3.1
Lemma C.1
Theorem C.2
proof
Theorem C.3: jaakkola1993convergence
proof
proof : Proof of Lemma \ref{['lem: y']}

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

TL;DR

Abstract

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)