Table of Contents
Fetching ...

DAG-aware Transformer for Causal Effect Estimation

Manqing Liu, David R. Bellamy, Andrew L. Beam

TL;DR

The paper tackles causal effect estimation for $\tau$ and $\tau(x)$ in high-dimensional settings with complex causal graphs. It introduces a DAG-aware Transformer that encodes causal structure directly into the attention mechanism via a DAG-derived mask, enabling flexible estimation under G-formula, IPTW, AIPW, and proximal inference. Key contributions include the DAG-aware attention design, joint vs. separate training strategies, comprehensive evaluation on Lalonde, ACIC, and proximal tasks showing improved accuracy, and extension to proximal inference with NMMR-U/NMMR-V. This approach advances robust, structure-aware causal inference with potential impact on policy, medicine, and economics.

Abstract

Causal inference is a critical task across fields such as healthcare, economics, and the social sciences. While recent advances in machine learning, especially those based on the deep-learning architectures, have shown potential in estimating causal effects, existing approaches often fall short in handling complex causal structures and lack adaptability across various causal scenarios. In this paper, we present a novel transformer-based method for causal inference that overcomes these challenges. The core innovation of our model lies in its integration of causal Directed Acyclic Graphs (DAGs) directly into the attention mechanism, enabling it to accurately model the underlying causal structure. This allows for flexible estimation of both average treatment effects (ATE) and conditional average treatment effects (CATE). Extensive experiments on both synthetic and real-world datasets demonstrate that our approach surpasses existing methods in estimating causal effects across a wide range of scenarios. The flexibility and robustness of our model make it a valuable tool for researchers and practitioners tackling complex causal inference problems.

DAG-aware Transformer for Causal Effect Estimation

TL;DR

The paper tackles causal effect estimation for and in high-dimensional settings with complex causal graphs. It introduces a DAG-aware Transformer that encodes causal structure directly into the attention mechanism via a DAG-derived mask, enabling flexible estimation under G-formula, IPTW, AIPW, and proximal inference. Key contributions include the DAG-aware attention design, joint vs. separate training strategies, comprehensive evaluation on Lalonde, ACIC, and proximal tasks showing improved accuracy, and extension to proximal inference with NMMR-U/NMMR-V. This approach advances robust, structure-aware causal inference with potential impact on policy, medicine, and economics.

Abstract

Causal inference is a critical task across fields such as healthcare, economics, and the social sciences. While recent advances in machine learning, especially those based on the deep-learning architectures, have shown potential in estimating causal effects, existing approaches often fall short in handling complex causal structures and lack adaptability across various causal scenarios. In this paper, we present a novel transformer-based method for causal inference that overcomes these challenges. The core innovation of our model lies in its integration of causal Directed Acyclic Graphs (DAGs) directly into the attention mechanism, enabling it to accurately model the underlying causal structure. This allows for flexible estimation of both average treatment effects (ATE) and conditional average treatment effects (CATE). Extensive experiments on both synthetic and real-world datasets demonstrate that our approach surpasses existing methods in estimating causal effects across a wide range of scenarios. The flexibility and robustness of our model make it a valuable tool for researchers and practitioners tackling complex causal inference problems.

Paper Structure

This paper contains 26 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Architecture of the DAG-aware Transformer model. The Transformer Encoder incorporates the DAG-aware attention mechanism (highlighted with dashed lines), which utilizes the causal structure represented by the DAG. The adjacency matrix derived from the causal DAG informs the DAG-aware attention computation. The model combines the output from the transformer encoder with the raw input through a weighted average, which is then processed by an MLP to produce the final output. For simplicity, layer normalization and feed-forward networks within the transformer encoder are not shown.
  • Figure 2: Mean NRMSE with Standard Error for LaLonde CPS Dataset
  • Figure 3: Mean NRMSE with Standard Error for LaLonde PSID Dataset
  • Figure 4: Mean NRMSE with Standard Error for LaLonde PSID Dataset
  • Figure 5: Median (IQR) of c-MSE for Demand Dataset
  • ...and 1 more figures