Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

Milan Bhan; Jean-Noel Vittaut; Nicolas Chesneau; Marie-Jeanne Lesot

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

Milan Bhan, Jean-Noel Vittaut, Nicolas Chesneau, Marie-Jeanne Lesot

TL;DR

The paper tackles improving small language models by enabling them to generate and use their own rationales without human annotations or auxiliary proxy models. It introduces Self-AMPLIFY, a 3-step pipeline that selects informative samples, derives rationales via post hoc explanations applied to the SLM itself, and composes augmented ICL prompts for improved reasoning. Across five reasoning-heavy datasets and multiple 7B-scale SLMs, Self-AMPLIFY—especially with Ph-CoT rationales—consistently outperforms standard prompting and proxy-based baselines, demonstrating the viability of fully automated self-improvement. The work underscores the potential of post hoc explanations as self-improvement signals for autoregressive SLMs and points to future directions in rationale faithfulness, efficiency, and broader validation across models and tasks.

Abstract

Incorporating natural language rationales in the prompt and In-Context Learning (ICL) have led to a significant improvement of Large Language Models (LLMs) performance. However, generating high-quality rationales require human-annotation or the use of auxiliary proxy models. In this work, we propose Self-AMPLIFY to automatically generate rationales from post hoc explanation methods applied to Small Language Models (SLMs) to improve their own performance. Self-AMPLIFY is a 3-step method that targets samples, generates rationales and builds a final prompt to leverage ICL. Self-AMPLIFY performance is evaluated on four SLMs and five datasets requiring strong reasoning abilities. Self-AMPLIFY achieves good results against competitors, leading to strong accuracy improvement. Self-AMPLIFY is the first method to apply post hoc explanation methods to autoregressive language models to generate rationales to improve their own performance in a fully automated manner.

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

TL;DR

Abstract

Paper Structure (47 sections, 11 figures, 6 tables)

This paper contains 47 sections, 11 figures, 6 tables.

Introduction
Background and Related Work
Post Hoc Explanations Background
Attribution method.
Post hoc free text self-rationales.
Related Work
Human-annotated rationales.
Automatically generated rationales.
Proposed approach: Self-AMPLIFY
Self-AMPLIFY overview
(i) $n$-shot Sample Selection.
(ii) Rationale Generation.
(iii) Prompt Design for SLMs.
$n$-shot Sample Selection
Rationale Generation
...and 32 more sections

Figures (11)

Figure 1: Example of four responses to a question from the Snarks dataset, generated from different ICL prompting strategies. Traditional input-output (IO) prompting, Auto-CoTzhang_automatic_2022 and AMPLIFYkrishna_post_2023 fail to answer properly, whereas Self-AMPLIFY generates important tokens as a rationale before correctly answering.
Figure 2: Self-AMPLIFY overview. Self-AMPLIFY is a 3-step approach generating rationales to self-improve a SLM in a ICL setting. (1) Promising samples are targeted following two selection strategies: success or error. (2) Rationales are generated based on a post hoc explanation method: KernelShap, DeepLift, Ph-CoT or Self_topk. (3) The final ICL prompt is built based on the previously generated rationales.
Figure 3: Self-AMPLIFY rationale generation step with a post hoc attribution method. Here, DeepLift or KernelShap is applied to the SLM to explain the answer D. The 4 most important tokens are targeted and the final rationale $r$ is constructed based on these keywords. The ($x$, $r$, $y$) triplet is finally added to the ICL prompt.
Figure 4: Self-AMPLIFY (blue) and competitors (red) accuracy (%) with Gemma-2 (left) and Gemma-7B (right). Self-AMPLIFY is run on 2 versions: DeepLift and Ph-CoT. With $p$ as the $p$-value of the paired $t$-test, *$p<10$%, **$p<5$%, ***$p<1$%. IO stands for the reference baseline.
Figure 5: Accuracy (%) of classical IO prompting and Self-AMPLIFY for different $topk$ post hoc explainers and different topk values. Evaluation is made with Mistral and Zephyr on Commonsense QA and Causal Judgment datasets.
...and 6 more figures

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

TL;DR

Abstract

Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)