Table of Contents
Fetching ...

Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation

Yong Zhang, Bingyuan Zhang, Zhitao Li, Ming Li, Ning Cheng, Minchuan Chen, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao

TL;DR

The paper tackles the decline of reasoning ability in small models compared to large language models and reveals latent reasoning paths that can emerge during sampling without chain-of-thought prompts. It introduces Self-Enhanced Reasoning Training (SERT), a two-stage approach that first generates and filters latent reasoning paths and then self-trains the model to produce more coherent reasoning, followed by optional reasoning distillation from a high-capacity teacher. Empirical results with GPT-3.5 as teacher and GPT-2 variants as students on StrategyQA and CommonsenseQA show that SERT activates latent reasoning, improves reasoning path quality, and boosts task accuracy, with particularly strong benefits for smaller models when combined with distillation. The findings highlight a practical method to uplift reasoning in compact models, enabling more capable zero-shot reasoning and more effective learning from larger teachers in constrained settings.

Abstract

The rapid advancement of large language models (LLMs) has significantly enhanced their reasoning abilities, enabling increasingly complex tasks. However, these capabilities often diminish in smaller, more computationally efficient models like GPT-2. Recent research shows that reasoning distillation can help small models acquire reasoning capabilities, but most existing methods focus primarily on improving teacher-generated reasoning paths. Our observations reveal that small models can generate high-quality reasoning paths during sampling, even without chain-of-thought prompting, though these paths are often latent due to their low probability under standard decoding strategies. To address this, we propose Self-Enhanced Reasoning Training (SERT), which activates and leverages latent reasoning capabilities in small models through self-training on filtered, self-generated reasoning paths under zero-shot conditions. Experiments using OpenAI's GPT-3.5 as the teacher model and GPT-2 models as the student models demonstrate that SERT enhances the reasoning abilities of small models, improving their performance in reasoning distillation.

Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation

TL;DR

The paper tackles the decline of reasoning ability in small models compared to large language models and reveals latent reasoning paths that can emerge during sampling without chain-of-thought prompts. It introduces Self-Enhanced Reasoning Training (SERT), a two-stage approach that first generates and filters latent reasoning paths and then self-trains the model to produce more coherent reasoning, followed by optional reasoning distillation from a high-capacity teacher. Empirical results with GPT-3.5 as teacher and GPT-2 variants as students on StrategyQA and CommonsenseQA show that SERT activates latent reasoning, improves reasoning path quality, and boosts task accuracy, with particularly strong benefits for smaller models when combined with distillation. The findings highlight a practical method to uplift reasoning in compact models, enabling more capable zero-shot reasoning and more effective learning from larger teachers in constrained settings.

Abstract

The rapid advancement of large language models (LLMs) has significantly enhanced their reasoning abilities, enabling increasingly complex tasks. However, these capabilities often diminish in smaller, more computationally efficient models like GPT-2. Recent research shows that reasoning distillation can help small models acquire reasoning capabilities, but most existing methods focus primarily on improving teacher-generated reasoning paths. Our observations reveal that small models can generate high-quality reasoning paths during sampling, even without chain-of-thought prompting, though these paths are often latent due to their low probability under standard decoding strategies. To address this, we propose Self-Enhanced Reasoning Training (SERT), which activates and leverages latent reasoning capabilities in small models through self-training on filtered, self-generated reasoning paths under zero-shot conditions. Experiments using OpenAI's GPT-3.5 as the teacher model and GPT-2 models as the student models demonstrate that SERT enhances the reasoning abilities of small models, improving their performance in reasoning distillation.

Paper Structure

This paper contains 18 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison of reasoning path generation quality using top-k alternative token sampling versus raw sampling on 100 CommonsenseQA questions. For top-k alternative tokens, 5 outputs were generated for each of the top 5 tokens (labeled as Token 0 - Token 5), and 25 outputs were combined for Token All. In comparison, 25 outputs per question were generated from raw top-k sampling (labeled as Raw), with additional results provided for the raw sampling of the SERT-activated model. The y-axis shows the proportion of questions with at least one output in each score range (x-axis), with scores grouped into 2-point intervals from 0 to 10. Higher proportions in the higher score ranges indicate better reasoning quality. Detailed generation setup and evaluation criteria are provided in the experiment setup.
  • Figure 2: Structure of our proposed method, illustrating CommonSenseQA generation examples from GPT-2 large. The symbol marks undesirable outcomes, such as Repetitive Option Generation (repeating answers), Direct Response (answering without reasoning), and Imitative Generation (mimicking input style). The symbol highlights Latent Reasoning, which refers to coherent reasoning paths that are rarely expressed but present during sampling. Our goal is to activate and enhance these capabilities. Key elements are highlighted in bold. Truncated text is marked "Continued."
  • Figure 3: Performance comparison of GPT models of different sizes across various evaluation criteria on the CommonSenseQA test set.
  • Figure 4: Impact of Filtering on Reasoning Quality of CommonsenseQA’s Generated Data