Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

Congzhi Zhang; Linhai Zhang; Jialong Wu; Yulan He; Deyu Zhou

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

Congzhi Zhang, Linhai Zhang, Jialong Wu, Yulan He, Deyu Zhou

TL;DR

The paper introduces Causal Prompting, a prompting paradigm that debiases LLM outputs by leveraging front-door adjustment within a structural causal model. It decomposes the causal effect of the input prompt $X$ on the final answer $A$ into two parts via a chain-of-thought mediator $R$, estimating $P(r|do(X))$ through CoT-based clustering and $P(A|do(r))$ via NWGM-assisted in-context learning, then combining them to obtain $P(A|do(X))$. Contrastive learning aligns the encoder's representation with the LLM's chain-of-thought, improving causal effect estimation. Empirical results across seven NLP tasks and multiple backbones (open- and closed-source) show consistent improvements, particularly in math reasoning and multi-hop QA, and robustness to adversarial data. The approach enables test-time debiasing without accessing LLM logits and suggests scalable extensions to safety and alignment applications.

Abstract

Despite the notable advancements of existing prompting methods, such as In-Context Learning and Chain-of-Thought for Large Language Models (LLMs), they still face challenges related to various biases. Traditional debiasing methods primarily focus on the model training stage, including approaches based on data augmentation and reweighting, yet they struggle with the complex biases inherent in LLMs. To address such limitations, the causal relationship behind the prompting methods is uncovered using a structural causal model, and a novel causal prompting method based on front-door adjustment is proposed to effectively mitigate LLMs biases. In specific, causal intervention is achieved by designing the prompts without accessing the parameters and logits of LLMs. The chain-of-thought generated by LLM is employed as the mediator variable and the causal effect between input prompts and output answers is calculated through front-door adjustment to mitigate model biases. Moreover, to accurately represent the chain-of-thoughts and estimate the causal effects, contrastive learning is used to fine-tune the encoder of chain-of-thought by aligning its space with that of the LLM. Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets on both open-source and closed-source LLMs.

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

TL;DR

on the final answer

into two parts via a chain-of-thought mediator

, estimating

through CoT-based clustering and

via NWGM-assisted in-context learning, then combining them to obtain

. Contrastive learning aligns the encoder's representation with the LLM's chain-of-thought, improving causal effect estimation. Empirical results across seven NLP tasks and multiple backbones (open- and closed-source) show consistent improvements, particularly in math reasoning and multi-hop QA, and robustness to adversarial data. The approach enables test-time debiasing without accessing LLM logits and suggests scalable extensions to safety and alignment applications.

Abstract

Paper Structure (55 sections, 20 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 55 sections, 20 equations, 7 figures, 10 tables, 1 algorithm.

Introduction
Preliminaries
Structural Causal Model and Causal Intervention
Front-door Adjustment
Method
Estimation of $P(r|do(X))$
Estimation of $P(A|do(r))$
Estimation of $P(A|do(X))$
Representation Space Alignment
Experiments
Datasets
Baselines
Main Results
Robustness Study
More Experimental Results
...and 40 more sections

Figures (7)

Figure 1: Performance of different prompting methods on ABSA pontiki2016semeval and its adversarial datasets on LLaMA-7b. ReverseTarget, ReverseNonTarget, and AddDiff denote three different adversarial transformations by TextFlint wang2021textflint. IO denotes the zero-shot setting where only the input question outputs the answer.
Figure 2: LLMs suffer from bias in the pertaining corpus, leading them to rely on irrelevant text spans in prompts and generating incoherent chain-of-thoughts that harm the logical reasoning capability of the model. These examples were obtained by using the CoT prompting wei2022chain on the LLaMA3-8B model.
Figure 3: Structural causal model for the prompting method. (a) The causality of prompt and answer is confounded by unobservable variable. (b) The chain-of-thought generated by LLMs as a mediator variable between prompt and answer.
Figure 4: The overall framework of Causal Prompting. Firstly, based on the input prompt $X$ consisting of the demonstration examples and a question of the test example, we query the $\mathop{\mathrm{LLM}}\nolimits$ to generate $m$ distinct CoTs . Then, these CoTs are clustered into $K$ clusters by an $\mathop{\mathrm{Encoder}}\nolimits$-based clustering algorithm. Subsequently, $K$ representative CoTs are selected by searching the closest CoT to the cluster center. Secondly, the optimal demonstration examples are retrieved for each representative CoT through the $\mathop{\mathrm{Encoder}}\nolimits$-based intervention algorithm, and then the input prompt $\mathcal{P}_{r_k}^{iter}$ after the intervention is obtained. Finally, we query the $\mathop{\mathrm{LLM}}\nolimits$$T$ times, obtaining $T$ improved CoTs and $T$ answers for each representative CoT . The final answer is obtained by performing a weighted voting.
Figure 5: Comparison of FLOPs cost between Causal Prompting and CoT-SC method on LLaMA3.
...and 2 more figures

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

TL;DR

Abstract

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment

Authors

TL;DR

Abstract

Table of Contents

Figures (7)