Defining Expertise: Applications to Treatment Effect Estimation

Alihan Hüyük; Qiyao Wei; Alicia Curth; Mihaela van der Schaar

Defining Expertise: Applications to Treatment Effect Estimation

Alihan Hüyük, Qiyao Wei, Alicia Curth, Mihaela van der Schaar

TL;DR

This work reframes treatment effect estimation by treating decision-maker expertise as an actionable inductive bias. It defines two expertise types—predictive and prognostic—based on how actions align with treatment effects or potential outcomes, and establishes a theoretical bound linking expertise to overlap via in-context action variability. Through synthetic simulations and benchmark comparisons, the authors show that the dominant type of expertise in a dataset significantly influences which CATE estimation method performs best, and propose an Expertise-informed pipeline to estimate expertise and adapt the estimator accordingly. The findings highlight the practical value of modeling expertise for improved model selection and estimation in domains with domain-driven decision-makers, such as healthcare or education.

Abstract

Decision-makers are often experts of their domain and take actions based on their domain knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome of each available treatment. Actions of an expert thus naturally encode part of their domain knowledge, and can help make inferences within the same domain: Knowing doctors try to prescribe the best treatment for their patients, we can tell treatments prescribed more frequently are likely to be more effective. Yet in machine learning, the fact that most decision-makers are experts is often overlooked, and "expertise" is seldom leveraged as an inductive bias. This is especially true for the literature on treatment effect estimation, where often the only assumption made about actions is that of overlap. In this paper, we argue that expertise - particularly the type of expertise the decision-makers of a domain are likely to have - can be informative in designing and selecting methods for treatment effect estimation. We formally define two types of expertise, predictive and prognostic, and demonstrate empirically that: (i) the prominent type of expertise in a domain significantly influences the performance of different methods in treatment effect estimation, and (ii) it is possible to predict the type of expertise present in a dataset, which can provide a quantitative basis for model selection.

Defining Expertise: Applications to Treatment Effect Estimation

TL;DR

Abstract

Paper Structure (34 sections, 4 theorems, 11 equations, 8 figures, 3 tables)

This paper contains 34 sections, 4 theorems, 11 equations, 8 figures, 3 tables.

Introduction
Contributions
Problem setup: Treatment effect estimation
The treatment effect estimation problem
Defining expertise
Discussion on expertise
Why two types of expertise?
Are the two types of expertise related?
Can expertise be measured?
How does expertise differ from optimality?
Are experts more desirable than optimal policies?
Implications of expertise for treatment effect estimation
Applications to treatment effect estimation
Simulation environment
Decision-making policies
...and 19 more sections

Key Result

Proposition 1

For all $\pi\in\Delta(\{0,1\})^{\mathcal{X}}$,

Figures (8)

Figure 1: The higher the expertise of a policy, the lower its in-context action variability, hence its overlap, has to be (Prop. \ref{['prop:main']}). When expertise is high, leveraging it becomes critical as overlap would be low, making CATE estimation more challenging.
Figure 2: During our simulations, we increase the expertise: (i) by decreasing $\beta$ in $\pi_{\textit{soft}}$ (Best$\to$Expert), which also decreases the in-context action variability (i.e. the overlap), and (ii) by decreasing $d$ in $\pi_{\textit{mis}}$ (Worst$\to$Expert, as seen in Fig. \ref{['fig:trafficlights']}).
Figure 2: PEHE of various methods averaged across datasets from the Linear News environment.
Figure 3: As Best$\to$Expert (i.e. away from the "best case scenario"), treatment effect estimation gets generally harder and the performance of Baseline degrades. Similarly, as Worst$\to$Expert (i.e. away from the "worst case scenario") the performance of Baseline improves instead.
Figure 3: Performance improvements over Baseline. For predictive expertise, Action-predictive is able to improve more and more upon Baseline by exploiting the increasing expertise (both as Best$\to$Expert and Worst$\to$Expert). In contrast, Balancing gets worse with increasing expertise since the information it discards about the policy becomes more correlated with the treatment effects. However, for prognostic expertise, we observe that having non-predictive variables determine actions can help $\textit{Balancing}$ select against those variables when forming representations, improving performance upon Baseline---in most configurations as opposed to Action-predictive.
...and 3 more figures

Theorems & Definitions (8)

Definition 1: Prognostic expertise
Definition 2: Predictive expertise
Definition 3: Perfect prognostic/predictive expert
Definition 4: In-context action variability
Proposition 1: Boundedness of expertise and in-context action variability
Proposition 2: Determinism of perfect experts
Proposition 3
Proposition 4

Defining Expertise: Applications to Treatment Effect Estimation

TL;DR

Abstract

Defining Expertise: Applications to Treatment Effect Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)