Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
TL;DR
This work introduces VibeThinker-1.5B, a compact 1.5B-parameter model trained with the Spectrum-to-Signal Principle (SSP) to achieve strong reasoning with minimal cost. By separating the training into a diversity-focused Spectrum Phase (Two-Stage Diversity-Exploring Distillation) and a signal-focused MGPO RL Phase (MaxEnt-Guided Policy Optimization), the approach yields a rich solution spectrum that the RL phase then amplifies, enabling the model to outperform far larger counterparts on math benchmarks like AIME24/25 and HMMT25, as well as coding tasks on LiveCodeBench. The model achieves these results at under $8K in post-training costs and ~3900 GPU-hours, suggesting small models can approach large-model reasoning with substantial cost and energy savings. These findings prompt a reevaluation of Scaling Laws for reasoning and highlight the potential for broader participation in AI research through efficient, diversity-driven training paradigms.
Abstract
Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.
