Table of Contents
Fetching ...

Self-Taught Self-Correction for Small Language Models

Viktor Moskvoretskii, Chris Biemann, Irina Nikishina

TL;DR

This work tackles the challenge of intrinsic self-correction in small language models by introducing Self-Taught Self-Correction (STaSC), a unified iterative fine-tuning framework that relies solely on self-generated data. STaSC generalizes and extends prior self-correction approaches by explicitly controlling initial-answer exploration, correction sampling, filtering criteria, and fine-tuning strategy, enabling corrections to reinforce improved reasoning without external evaluators. The authors validate STaSC on a Natural Questions QA task using open-source models (Qwen-2.5-1.5B and Phi3-mini), showing improvements in both initial answers and corrected outputs, and providing detailed analyses of how design choices influence learning dynamics, such as filtering selectivity and evolving versus fixed initialization/fine-tuning. They further offer open-source code and lightweight models to facilitate future research and practical deployment, underscoring the practical potential of intrinsic self-correction for smaller models.

Abstract

Although large language models (LLMs) have achieved remarkable performance across various tasks, they remain prone to errors. A key challenge is enabling them to self-correct. While prior research has relied on external tools or large proprietary models, this work explores self-correction in small language models (SLMs) through iterative fine-tuning using solely self-generated data. We introduce the Self-Taught Self-Correction (STaSC) algorithm, which incorporates multiple algorithmic design choices. Experimental results on a question-answering task demonstrate that STaSC effectively learns self-correction, leading to significant performance improvements. Our analysis further provides insights into the mechanisms of self-correction and the impact of different design choices on learning dynamics and overall performance. To support future research, we release our user-friendly codebase and lightweight models.

Self-Taught Self-Correction for Small Language Models

TL;DR

This work tackles the challenge of intrinsic self-correction in small language models by introducing Self-Taught Self-Correction (STaSC), a unified iterative fine-tuning framework that relies solely on self-generated data. STaSC generalizes and extends prior self-correction approaches by explicitly controlling initial-answer exploration, correction sampling, filtering criteria, and fine-tuning strategy, enabling corrections to reinforce improved reasoning without external evaluators. The authors validate STaSC on a Natural Questions QA task using open-source models (Qwen-2.5-1.5B and Phi3-mini), showing improvements in both initial answers and corrected outputs, and providing detailed analyses of how design choices influence learning dynamics, such as filtering selectivity and evolving versus fixed initialization/fine-tuning. They further offer open-source code and lightweight models to facilitate future research and practical deployment, underscoring the practical potential of intrinsic self-correction for smaller models.

Abstract

Although large language models (LLMs) have achieved remarkable performance across various tasks, they remain prone to errors. A key challenge is enabling them to self-correct. While prior research has relied on external tools or large proprietary models, this work explores self-correction in small language models (SLMs) through iterative fine-tuning using solely self-generated data. We introduce the Self-Taught Self-Correction (STaSC) algorithm, which incorporates multiple algorithmic design choices. Experimental results on a question-answering task demonstrate that STaSC effectively learns self-correction, leading to significant performance improvements. Our analysis further provides insights into the mechanisms of self-correction and the impact of different design choices on learning dynamics and overall performance. To support future research, we release our user-friendly codebase and lightweight models.

Paper Structure

This paper contains 36 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of the self-improvement method STaR (left) zelikman2022starbootstrappingreasoningreasoning, self-correction method SC (center) welleck2022generating, and our method, STaSC (right). STaCS offers flexible control over initial answer exploration, correction filtering, and iterative fine-tuning. It is inspired by STaR and effectively encompasses SC as a special case. SC and STaSC allow several initial answers and corrections. The dotted line in the STaSC denotes two possible setups: fine-tuning the model and generating from it at the next iteration (Evolving Fine-Tuning) and keeping the Generator frozen and fine-tuning the Corrector model only (Fixed Fine-Tuning).
  • Figure 2: Correction In-accuracy for STaSC versions with Evolving Initialization for Phi3-mini and Qwen-2.5-1.5B.
  • Figure 3: Correction In-accuracy for STaSC versions with Fixed Initialization for Phi3-mini and Qwen-2.5-1.5B.
  • Figure 4: Correction and Initial Answer In-accuracy for best STaSC versions for Phi3-mini and Qwen-2.5-1.5B.
  • Figure 5: Example of the STaSC pipeline.