Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

Ruoyu Wang; Xiaoxuan Li; Lina Yao

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

Ruoyu Wang, Xiaoxuan Li, Lina Yao

TL;DR

Deconfounded Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model's reasoning capabilities by encouraging the model to extract the general problem-solving skills and apply these skills to different questions.

Abstract

Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions, but studies reveal that they often struggle with tasks requiring reasoning, such as math or physics. This limitation raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content. In this paper, we delve into this problem and aim to enhance the reasoning capabilities of LLMs. First, we investigate if the model has genuine reasoning capabilities by visualizing the text generation process at the attention and representation level. Then, we formulate the reasoning process of LLMs into a causal framework, which provides a formal explanation of the problems observed in the visualization. Finally, building upon this causal framework, we propose Deconfounded Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model's reasoning capabilities by encouraging the model to extract the general problem-solving skills and apply these skills to different questions. Experiments show that our method outperforms the baseline consistently across multiple benchmarks, and with only 1.2M tunable parameters, we achieve better or comparable results to other fine-tuning methods. This demonstrates the effectiveness and efficiency of our method in improving the overall accuracy and reliability of LLMs.

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

TL;DR

Abstract

Paper Structure (18 sections, 9 equations, 5 figures, 1 table)

This paper contains 18 sections, 9 equations, 5 figures, 1 table.

Introduction
Preliminary
LLAMA-Adapter
Causal Inference
Our Method
Investigation and Motivation
Method Specification
Implementation of Causal Intervention
Experiment
Experimental Settings
Tasks for Evaluation
Baselines and Comparison Methods
Overall Results
Effects of New Parameters
Further Discussions
...and 3 more sections

Figures (5)

Figure 1: Parameter-Efficient Fine-Tuning (PEFT) methods transform the non-prompt-following model to prompt-following by injecting a small number of learnable parameters into the pre-trained LLM. Our method lies in the domain of PEFT and concentrates on its problem-solving capabilities.
Figure 2: (a) The architecture of LLaMA-Adapter. A trainable lightweight adapter is inserted into each of the topmost L layers out of the N transformer layers of LLaMA. Aided by zero-init attention and gating mechanisms, the adaption prompt progressively learns new instructional cues, without disturbing the original pre-trained knowledge; (b) X $\rightarrow$ Z $\rightarrow$ Y is a chain, X $\leftarrow$ C $\rightarrow$ Y is a fork, C $\rightarrow$ Y $\leftarrow$ Z is a collider; (c) We perform intervention $do(X)$ to cut the edge $C \rightarrow X$ so that the causal effect $P(Y|do(X))$ can be estimated.
Figure 3: (a)-(b) Changing the value of the string affects the functioning of the attention mechanism as highlighted. (c) Causal graph of the reasoning process; (d) We block the backdoor path by performing an intervention on $X_{G}$.
Figure 4: The framework of our method. First, we divide the concatenated Adapter prompt in LLaMA-Adapter into two segments, $Adap1$ and $Adap2$. This affects the dimensions of $K$ and $V$ in the self-attention mechanism, as denoted on the right-hand side. The process of generating the feature $Z$ remains unchanged, and we build our causal loss $\mathcal{L}_{causal}$ by manipulating $Adap1$.
Figure 5: Effect of $H$ and $\alpha$ on LConcat. The red dot line denotes baseline accuracy. The value of these parameters should be chosen carefully, otherwise may harm the performance when the values are too large.

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

TL;DR

Abstract

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)