Toward a Theory of Causation for Interpreting Neural Code Models

David N. Palacio; Alejandro Velasco; Nathan Cooper; Alvaro Rodriguez; Kevin Moran; Denys Poshyvanyk

Toward a Theory of Causation for Interpreting Neural Code Models

David N. Palacio, Alejandro Velasco, Nathan Cooper, Alvaro Rodriguez, Kevin Moran, Denys Poshyvanyk

TL;DR

The paper tackles the interpretability gap in Neural Code Models (NCMs) by proposing a causal post hoc framework, $do_{code}$, to explain code predictions beyond traditional accuracy metrics. It grounds explanations in programming-language properties with a Structural Causal Model (SCM) and adopts Pearl's Ladder of Causation to define interventions and estimands such as $ATE$ and $p(Y|do(T))$. The four-step pipeline (model SCM, identify estimand, estimate causal effects, validate the causal process) is demonstrated through seven case studies across RNNs, GRUs, GPT-2, and a BERT-like model, with a replication package icodegen. Key findings show that many observed correlations are confounded, but certain interventions (e.g., token masking) reveal meaningful causal effects, enabling bias detection and more trustworthy NCM dev tools.

Abstract

Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces $do_{code}$, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. $do_{code}$ is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of $do_{code}$ are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of $do_{code}$, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (\eg brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of $do_{code}$ as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

Toward a Theory of Causation for Interpreting Neural Code Models

TL;DR

The paper tackles the interpretability gap in Neural Code Models (NCMs) by proposing a causal post hoc framework,

, to explain code predictions beyond traditional accuracy metrics. It grounds explanations in programming-language properties with a Structural Causal Model (SCM) and adopts Pearl's Ladder of Causation to define interventions and estimands such as

and

. The four-step pipeline (model SCM, identify estimand, estimate causal effects, validate the causal process) is demonstrated through seven case studies across RNNs, GRUs, GPT-2, and a BERT-like model, with a replication package icodegen. Key findings show that many observed correlations are confounded, but certain interventions (e.g., token masking) reveal meaningful causal effects, enabling bias detection and more trustworthy NCM dev tools.

Abstract

, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions.

is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of

are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of

, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (\eg brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of

as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

Paper Structure (43 sections, 4 equations, 15 figures, 11 tables)

This paper contains 43 sections, 4 equations, 15 figures, 11 tables.

Introduction
Background & Related Work
Why do we Need Causal Interpretability for Deep Learning Models applied to Software Engineering?
Pearl's Ladder of Causation
Software Engineering-Based Interventions
The Causal Interpretability Hypothesis
An Overview of the do$_{code}$ Approach
Step One: Modeling Causal Problem
SE-Based Counterfactual Interventions
Potential Outcomes / Code Predictions
Common Causes or SE-based Confounders
Step Two: Identifying Causal Estimand
Step Three: Estimating Causal Effects
Step Four: Validating Causal Process
Refuting Effect Estimate
...and 28 more sections

Figures (15)

Figure 1: Common Cause Principle in NCMs Scholkopf2022
Figure 2: A classification of key methods in interpretability including Causal Interpretability.
Figure 3: Ladder of Causation: do$_{code}$ is an extension of the intervention level.
Figure 4: Spurious Correlation between the Number of Subwords common cause and Cross-Entropy values ($p(Y|Z)\approx0.87$) for the ProgramRepair intervention generated from GPT-2$_{6,12}$.
Figure 5: Overview of the do$_{code}$ Approach: Each numeral represents a step in the process of generating causal interpretations. First, a Structural Causal Model (SCM) that frames the explanation hypothesis is formulated. Second, do$_{code}$ executes graph surgery on the SCM to isolate a targeted estimand of the causal effect. Third, the causal effect is assessed based on the targeted estimand. Finally, the estimated effect and the SCM are scrutinized through refutation techniques and exploratory analysis to confirm their validity.
...and 10 more figures

Theorems & Definitions (8)

Definition 1
Example 1
Definition 2
Definition 3
Example 2
Definition 4
Definition 5
Definition 6

Toward a Theory of Causation for Interpreting Neural Code Models

TL;DR

Abstract

Toward a Theory of Causation for Interpreting Neural Code Models

Authors

TL;DR

Abstract

Table of Contents

Figures (15)

Theorems & Definitions (8)