Distilling Mathematical Reasoning Capabilities into Small Language Models
Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang
TL;DR
This work tackles democratizing mathematical reasoning by distilling LLM capabilities into sub-billion parameter SLMs. It introduces Equation-of-Thought Distillation (EoTD), which encodes reasoning as equations solved by an external solver, and Ensemble Thoughts Distillation (ETD), which combines CoT, PoT, and EoT to create a diverse, multi-form reasoning dataset for fine-tuning. Empirical results across GSM8K, ASDiv, SVAMP, and MultiArith show EoTD significantly improves SLM reasoning, while ETD delivers state-of-the-art performance across model sizes, with larger SLMs benefiting more from diverse thought forms. The approach offers a pathway to deploy capable mathematical reasoning tools on resource-constrained hardware, with potential extensions beyond mathematics to broader reasoning tasks.
Abstract
This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
