Self-Enhanced Reasoning Training: Activating Latent Reasoning in Small Models for Enhanced Reasoning Distillation
Yong Zhang, Bingyuan Zhang, Zhitao Li, Ming Li, Ning Cheng, Minchuan Chen, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao
TL;DR
The paper tackles the decline of reasoning ability in small models compared to large language models and reveals latent reasoning paths that can emerge during sampling without chain-of-thought prompts. It introduces Self-Enhanced Reasoning Training (SERT), a two-stage approach that first generates and filters latent reasoning paths and then self-trains the model to produce more coherent reasoning, followed by optional reasoning distillation from a high-capacity teacher. Empirical results with GPT-3.5 as teacher and GPT-2 variants as students on StrategyQA and CommonsenseQA show that SERT activates latent reasoning, improves reasoning path quality, and boosts task accuracy, with particularly strong benefits for smaller models when combined with distillation. The findings highlight a practical method to uplift reasoning in compact models, enabling more capable zero-shot reasoning and more effective learning from larger teachers in constrained settings.
Abstract
The rapid advancement of large language models (LLMs) has significantly enhanced their reasoning abilities, enabling increasingly complex tasks. However, these capabilities often diminish in smaller, more computationally efficient models like GPT-2. Recent research shows that reasoning distillation can help small models acquire reasoning capabilities, but most existing methods focus primarily on improving teacher-generated reasoning paths. Our observations reveal that small models can generate high-quality reasoning paths during sampling, even without chain-of-thought prompting, though these paths are often latent due to their low probability under standard decoding strategies. To address this, we propose Self-Enhanced Reasoning Training (SERT), which activates and leverages latent reasoning capabilities in small models through self-training on filtered, self-generated reasoning paths under zero-shot conditions. Experiments using OpenAI's GPT-3.5 as the teacher model and GPT-2 models as the student models demonstrate that SERT enhances the reasoning abilities of small models, improving their performance in reasoning distillation.
