Table of Contents
Fetching ...

Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

Xialie Zhuang, Peixian Ma, Zhikai Jia, Zane Cao, Shiwei Liu

TL;DR

The paper investigates how to boost mathematical reasoning in Small Reasoning Language Models (SRLMs) around 0.5B parameters by evaluating supervised fine-tuning (SFT), knowledge distillation (KD), reinforcement learning (RL), and their hybrids. Using GSM8K and other math benchmarks, it demonstrates that SRLMs lag behind larger models but can achieve meaningful gains through carefully designed training pipelines, with RL offering robust improvements and hybrids providing selective advantages depending on model and task. The work provides practical guidance on when to combine SFT, KD, and RL, highlights potential instability in some hybrids, and suggests future directions in advanced training methods, distillation strategies, and efficiency. Overall, the findings help make advanced reasoning capabilities more accessible for resource-constrained deployments by clarifying effective SRLM enhancement strategies and pipelines.

Abstract

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

TL;DR

The paper investigates how to boost mathematical reasoning in Small Reasoning Language Models (SRLMs) around 0.5B parameters by evaluating supervised fine-tuning (SFT), knowledge distillation (KD), reinforcement learning (RL), and their hybrids. Using GSM8K and other math benchmarks, it demonstrates that SRLMs lag behind larger models but can achieve meaningful gains through carefully designed training pipelines, with RL offering robust improvements and hybrids providing selective advantages depending on model and task. The work provides practical guidance on when to combine SFT, KD, and RL, highlights potential instability in some hybrids, and suggests future directions in advanced training methods, distillation strategies, and efficiency. Overall, the findings help make advanced reasoning capabilities more accessible for resource-constrained deployments by clarifying effective SRLM enhancement strategies and pipelines.

Abstract

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations, to enhance the performance of 0.5B SRLMs. We analyze effective methodologies to bridge the performance gap between SRLMS and larger models and present insights into optimal training pipelines tailored for these smaller architectures. Through extensive experimental validation and analysis, our work aims to provide actionable recommendations for maximizing the reasoning capabilities of 0.5B models.

Paper Structure

This paper contains 29 sections, 3 equations, 1 figure, 3 tables.