Post Hoc Explanations of Language Models Can Improve Language Models

Satyapriya Krishna; Jiaqi Ma; Dylan Slack; Asma Ghandeharioun; Sameer Singh; Himabindu Lakkaraju

Post Hoc Explanations of Language Models Can Improve Language Models

Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju

TL;DR

AMPLIFY tackles the scalability limitation of human-annotated Chain-of-Thought prompting by automatically generating rationales from post hoc explanations. It uses a small proxy model to produce attribution-based signals, selects high-impact misclassifications via Misclassification Confidence Score, and crafts few-shot prompts with rationale to improve large language models in diverse tasks. The approach yields 10–25% accuracy gains and robust ablation results, highlighting the potential of post hoc explanations to enhance LLM reasoning without heavy human input. This work provides practical insights into proxy-model choice, sample selection, explanation methods, and rationale templates for refining in-context learning.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.

Post Hoc Explanations of Language Models Can Improve Language Models

TL;DR

Abstract

Paper Structure (33 sections, 3 figures, 10 tables)

This paper contains 33 sections, 3 figures, 10 tables.

Introduction
Related Works
In-context Learning.
Post Hoc Explanations.
Our Framework AMPLIFY
Step (1): Proxy Model Selection.
Step (2): Few-shot Sample Selection.
Step (3): Rationale Generation.
Step (4): Prompt Design for LLMs.
Experimental Evaluation
Datasets.
Large Language Models.
Post Hoc Explanation Techniques.
Baseline Methods.
Implementation Details.
...and 18 more sections

Figures (3)

Figure 1: The AMPLIFY framework consists of four steps aimed at improving the performance of LLMs. (1) We select a proxy model, such as GPT-2 or BERT, which is significantly smaller in size compared to the LLMs and for which it is computationally feasible to generate post hoc explanations. (2) By leveraging the validation set, we identify samples that were misclassified by the LLM. Subsequently, we select the samples that the proxy model exhibits the highest level of confidence in misclassifying. (3) We then use explainability techniques to compute explanations for the selected samples with respect to their ground truth labels. (4) We construct the few-shot prompt for LLM using the samples selected and their corresponding explanations to feed as input to LLM for prediction.
Figure 2: This image exemplifies an instance of Causal Judgment task where standard prompts and CoT produce inaccurate responses. The CoT response fails to take into account that the red wire should not make contact with the battery, which caused the short circuit. In contrast, the response generated by AMPLIFY emphasizes this crucial detail. Please note that while we inject rationales in terms of $\text{k}$ individual words, we do not restrict LLMs from generating rationales in terms of phrases or multiple words. This is why we often see LLM-generated rationales having multi-word clues, such as "red wire," "never supposed," and so on.
Figure 3: This image exemplifies an instance of CommonsenseQA task where standard prompts and CoT produce inaccurate responses. The CoT response fails to take into account the context in the question being related to eyes. In contrast, the response generated by AMPLIFY emphasizes this crucial detail.

Post Hoc Explanations of Language Models Can Improve Language Models

TL;DR

Abstract

Post Hoc Explanations of Language Models Can Improve Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)