Table of Contents
Fetching ...

Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation

Hechuan Wen, Tong Chen, Guanhua Ye, Li Kheng Chai, Shazia Sadiq, Hongzhi Yin

TL;DR

The Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition is proposed, which aims to reduce both the CEE model's uncertainty and the post-acquisition distributional imbalance simultaneously at each acquisition step.

Abstract

Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity. As CEE relaxes the requirement for ``perfect'' counterfactual samples (e.g., patients with identical attributes and only differ in treatments received) that are impractical to obtain and can instead operate on observational data, it is usually used in high-stake domains like medical treatment effect prediction. Nevertheless, in those high-stake domains, gathering a decently sized, fully labelled observational dataset remains challenging due to hurdles associated with costs, ethics, expertise and time needed, etc., of which medical treatment surveys are a typical example. Consequently, if the training dataset is small in scale, low generalization risks can hardly be achieved on any CEE algorithms. Unlike existing CEE methods that assume the constant availability of a dataset with abundant samples, in this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning, while more can be gradually acquired over the course of training -- assuredly under a limited budget considering their expensive nature. Then, the problem naturally comes down to actively selecting the best possible samples to be labelled, e.g., identifying the next subset of patients to conduct the treatment survey. However, acquiring quality data for reducing the CEE risk under limited labelling budgets remains under-explored until now. To fill the gap, we theoretically analyse the generalization risk from an intriguing perspective of progressively shrinking its upper bound, and develop a principled label acquisition pipeline exclusively for CEE tasks. With our analysis, we propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition, which aims to reduce both the CEE model's uncertainty and the post-acquisition ...

Progressive Generalization Risk Reduction for Data-Efficient Causal Effect Estimation

TL;DR

The Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition is proposed, which aims to reduce both the CEE model's uncertainty and the post-acquisition distributional imbalance simultaneously at each acquisition step.

Abstract

Causal effect estimation (CEE) provides a crucial tool for predicting the unobserved counterfactual outcome for an entity. As CEE relaxes the requirement for ``perfect'' counterfactual samples (e.g., patients with identical attributes and only differ in treatments received) that are impractical to obtain and can instead operate on observational data, it is usually used in high-stake domains like medical treatment effect prediction. Nevertheless, in those high-stake domains, gathering a decently sized, fully labelled observational dataset remains challenging due to hurdles associated with costs, ethics, expertise and time needed, etc., of which medical treatment surveys are a typical example. Consequently, if the training dataset is small in scale, low generalization risks can hardly be achieved on any CEE algorithms. Unlike existing CEE methods that assume the constant availability of a dataset with abundant samples, in this paper, we study a more realistic CEE setting where the labelled data samples are scarce at the beginning, while more can be gradually acquired over the course of training -- assuredly under a limited budget considering their expensive nature. Then, the problem naturally comes down to actively selecting the best possible samples to be labelled, e.g., identifying the next subset of patients to conduct the treatment survey. However, acquiring quality data for reducing the CEE risk under limited labelling budgets remains under-explored until now. To fill the gap, we theoretically analyse the generalization risk from an intriguing perspective of progressively shrinking its upper bound, and develop a principled label acquisition pipeline exclusively for CEE tasks. With our analysis, we propose the Model Agnostic Causal Active Learning (MACAL) algorithm for batch-wise label acquisition, which aims to reduce both the CEE model's uncertainty and the post-acquisition ...

Paper Structure

This paper contains 26 sections, 9 theorems, 12 equations, 8 figures, 2 tables.

Key Result

Proposition 1

The causal effect is identifiable if and only if the SUTVA, the unconfoundedness, and the positivity assumptions hold.

Figures (8)

  • Figure 1: Toy example indicating the importance of considering the reduction of both distributional discrepancy and model variance to help quickly achieve the lowest risk.
  • Figure 2: At $i$-th query step, the entire batch selection is divided into several pairs selection. Left: The most uncertain points are the candidates for selection, and the pair score is penalized by the distance between two points; Mid: The two most uncertain and closest points are selected (not labelled yet) and objectively bringing down other unlabelled points' uncertainty in proximity; Right: Select the next pair recursively until the batch is filled up.
  • Figure 3: Benchmarking with the available SOTAs on three datasets, i.e., IHDP (average with 100 simulations), IBM (average with 10 simulations) and CMNIST (average with 10 simulations). The first column concludes the comparisons against the baselines designated for active causal effect estimation. The second column shows the comparisons with the baselines from general active learning approaches. All of the results are given by the same downstream CEE model by DUE-DNN for IHDP & IBM, and DUE-CNN for CMNIST.
  • Figure 4: Visualization of the post-acquisition dataset (IHDP: First two columns, IBM: Middle two columns, and CMNIST: Last two columns) via t-SNE for treatment group $t=1$, treatment group $t=0$, and overlapping for Random, $\mu\rho$BALD, and MACAL.
  • Figure 5: Hyperparameter $\alpha$ representing various levels of symmetrical regularization for label acquisition. The relative performance of MACAL_$\alpha_{i}$ is calculated as $\epsilon_{\text{PEHE},\alpha=0}$/$\epsilon_{\text{PEHE},\alpha=\alpha_{i}}$, the higher the better.
  • ...and 3 more figures

Theorems & Definitions (15)

  • Definition 1
  • Definition 2: Causal Effect Estimation
  • Proposition 1: Identifiability
  • Definition 3: Active Causal Effect Estimation
  • Theorem 1
  • Remark 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Lemma 2
  • ...and 5 more