Self-Taught Self-Correction for Small Language Models

Viktor Moskvoretskii; Chris Biemann; Irina Nikishina

Self-Taught Self-Correction for Small Language Models

Viktor Moskvoretskii, Chris Biemann, Irina Nikishina

TL;DR

This work tackles the challenge of intrinsic self-correction in small language models by introducing Self-Taught Self-Correction (STaSC), a unified iterative fine-tuning framework that relies solely on self-generated data. STaSC generalizes and extends prior self-correction approaches by explicitly controlling initial-answer exploration, correction sampling, filtering criteria, and fine-tuning strategy, enabling corrections to reinforce improved reasoning without external evaluators. The authors validate STaSC on a Natural Questions QA task using open-source models (Qwen-2.5-1.5B and Phi3-mini), showing improvements in both initial answers and corrected outputs, and providing detailed analyses of how design choices influence learning dynamics, such as filtering selectivity and evolving versus fixed initialization/fine-tuning. They further offer open-source code and lightweight models to facilitate future research and practical deployment, underscoring the practical potential of intrinsic self-correction for smaller models.

Abstract

Although large language models (LLMs) have achieved remarkable performance across various tasks, they remain prone to errors. A key challenge is enabling them to self-correct. While prior research has relied on external tools or large proprietary models, this work explores self-correction in small language models (SLMs) through iterative fine-tuning using solely self-generated data. We introduce the Self-Taught Self-Correction (STaSC) algorithm, which incorporates multiple algorithmic design choices. Experimental results on a question-answering task demonstrate that STaSC effectively learns self-correction, leading to significant performance improvements. Our analysis further provides insights into the mechanisms of self-correction and the impact of different design choices on learning dynamics and overall performance. To support future research, we release our user-friendly codebase and lightweight models.

Self-Taught Self-Correction for Small Language Models

TL;DR

Abstract

Self-Taught Self-Correction for Small Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)