TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection
Shengmin Piao, Sanghyun Park
TL;DR
TinyThinker addresses the risk of superficial imitation when distilling reasoning from large language models by introducing a coarse-to-fine knowledge internalization framework. It couples a three-stage reasoning process—recall, analyze, summarize—with a two-phase training regimen: reasoning acquisition and self-reflection guided by iterative Direct Preference Optimization (DPO). Empirical results on CommonsenseQA, OpenBookQA, and StrategyQA show consistent gains, especially for OBQA and StrategyQA, with ablations confirming the value of each component and its scalability to larger student models. The approach offers a flexible, knowledge-centric path to endow smaller models with robust reasoning capabilities, with potential extensions to other knowledge-intensive tasks and future improvements in data quality and generation efficiency.
Abstract
Large Language Models exhibit impressive reasoning capabilities across diverse tasks, motivating efforts to distill these capabilities into smaller models through generated reasoning data. However, direct training on such synthesized reasoning data may lead to superficial imitation of reasoning process, rather than fostering a genuine integration of reasoning capabilities with underlying knowledge. To address this, we propose TinyThinker, a framework introducing two novel approaches. First, we introduce a three-stage process that incrementally guides the student model through the reasoning process, progressively refining knowledge from coarse to fine granularity. Second, we develop a two-phase training framework comprising an initial reasoning acquisition phase followed by a self-reflection phase utilizing self-generated data. Experiments on commonsense reasoning benchmarks demonstrate that TinyThinker achieves superior performance compared to baselines. Ablation studies further validate the effectiveness of each component in our framework. We expect that TinyThinker can be extended to other knowledge-intensive reasoning tasks, offering an alternative strategy for developing effective reasoning capabilities in smaller language models. Codes are available at https://github.com/shengminp/TinyThinker
