Table of Contents
Fetching ...

MCCE: Missingness-aware Causal Concept Explainer

Jifan Gao, Guanhua Chen

TL;DR

This work introduces the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable, and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models.

Abstract

Causal concept effect estimation is gaining increasing interest in the field of interpretable machine learning. This general approach explains the behaviors of machine learning models by estimating the causal effect of human-understandable concepts, which represent high-level knowledge more comprehensibly than raw inputs like tokens. However, existing causal concept effect explanation methods assume complete observation of all concepts involved within the dataset, which can fail in practice due to incomplete annotations or missing concept data. We theoretically demonstrate that unobserved concepts can bias the estimation of the causal effects of observed concepts. To address this limitation, we introduce the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable. Our framework learns to account for residual bias resulting from missing concepts and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models. It can offer explanations on both local and global levels. We conduct validations using a real-world dataset, demonstrating that MCCE achieves promising performance compared to state-of-the-art explanation methods in causal concept effect estimation.

MCCE: Missingness-aware Causal Concept Explainer

TL;DR

This work introduces the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable, and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models.

Abstract

Causal concept effect estimation is gaining increasing interest in the field of interpretable machine learning. This general approach explains the behaviors of machine learning models by estimating the causal effect of human-understandable concepts, which represent high-level knowledge more comprehensibly than raw inputs like tokens. However, existing causal concept effect explanation methods assume complete observation of all concepts involved within the dataset, which can fail in practice due to incomplete annotations or missing concept data. We theoretically demonstrate that unobserved concepts can bias the estimation of the causal effects of observed concepts. To address this limitation, we introduce the Missingness-aware Causal Concept Explainer (MCCE), a novel framework specifically designed to estimate causal concept effects when not all concepts are observable. Our framework learns to account for residual bias resulting from missing concepts and utilizes a linear predictor to model the relationships between these concepts and the outputs of black-box machine learning models. It can offer explanations on both local and global levels. We conduct validations using a real-world dataset, demonstrating that MCCE achieves promising performance compared to state-of-the-art explanation methods in causal concept effect estimation.

Paper Structure

This paper contains 9 sections, 16 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The architecture of MCCE. Given an input sample, a vector representation is extracted. Pseudo-concepts are constructed under the constraint that they are orthogonal to the observed concepts, ensuring that the pseudo-concepts offer information to compensate for any lost information from unobserved concepts. Then a linear predictor is trained on the concatenation of observed concepts and the pseudo-concepts to approximate the behaviors of a black-box model. The entire pipeline can be trained end-to-end.
  • Figure 2: Causal structure graph. The impact of $U$ on $X$ is not only mediated by the observed concepts $C_{ob_1},...C_{ob_k}$ but also by the unobserved concepts $C_{un_1},...C_{un_j}$. In this work, we aim to account for the impact of unobserved concepts when estimating the causal effect of observed concepts, which has not been addressed in existing research. A backdoor path may exist from $C_{ob_1}$ to $\mathcal{N}(X)$, even though there is no direct path from $C_{ob_1}$ to $\mathcal{N}(X)$ and all other $C_{ob}$ are conditioned/blocked -- this occurs through $U$ and one of $C_{un}$pearl2009causality.
  • Figure 3: An illustration of the MCCE's global interpretation on a BERT model when the concepts of "Ambiance", "Service", and "Noise" are observed.