Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Xiangyu Wen, Junhua Huang, Zeju Li, Min Li, Jianyuan Zhong, Zhijian Xu, Mingxuan Yuan, Yongxiang Huang, Qiang Xu
TL;DR
This paper addresses the limitations of distilling reasoning by mimicking text and introduces Reasoning Scaffolding, a framework that distills the underlying algorithmic flow of thought into discrete semantic signals. It presents a three-part approach: extracting a structured logic scaffold from teacher traces (logic representation distillation), training SLMs with a dual objective (step-proposer and signal predictor), and applying signal-guided reasoning during inference with adaptive gating and pruning. Empirical results across StrategyQA, CommonsenseQA, TruthfulQA, GSM8K, and MATH-500 demonstrate significant accuracy and robustness gains over chain-of-thought and long-thinking baselines, including substantial improvements for small models. The work advances practical, faithful knowledge transfer by enabling smaller models to reason more like LLMs, with broad implications for efficiency, interpretability, and safe deployment in real-world tasks.
Abstract
The prevailing approach to distilling reasoning from Large Language Models (LLMs)-behavioral cloning from textual rationales-is fundamentally limited. It teaches Small Language Models (SLMs) to mimic surface-level patterns rather than the underlying algorithmic structure of thought, resulting in a critical lack of logical robustness. We argue that instead of cloning text, distillation should transfer this algorithmic structure directly. We introduce Reasoning Scaffolding}, a framework that reframes reasoning as a structured generation process. Our method first abstracts the teacher's thought process into a sequence of discrete, interpretable semantic signals (e.g., Contrast, Addition) that act as a scaffold. The student model is then trained via a multi-task objective to both (1)predict the next semantic signal, anticipating the reasoning flow, and (2)generate the corresponding step, conditioned on that signal. This multi-task scheme acts as a powerful regularizer, compelling the student to internalize the computational patterns of coherent reasoning. On a suite of challenging reasoning benchmarks, our method significantly outperforms state-of-the-art distillation in both accuracy and logical consistency, providing a path towards creating smaller models that are genuine reasoners, not just fluent mimics.
