Table of Contents
Fetching ...

Tina: Tiny Reasoning Models via LoRA

Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger

TL;DR

Tina investigates how to instill robust, multi-step reasoning in language models at minimal cost by applying LoRA-based RL updates to a compact 1.5B base model. The approach trains only low-rank adapters within a GRPO-style RL framework, achieving competitive or superior reasoning performance on six benchmarks (e.g., AIME, AMC, MATH500, GPQA, Minerva) with an estimated post-training cost around $9. The authors provide extensive open-source data, logs, and checkpoints, and report that smaller, high-quality datasets can outperform larger ones, highlighting rapid format adaptation as the key mechanism behind the efficiency gains. This work suggests that efficient RL reasoning can be democratized for broader participation, though it acknowledges limitations in base-model scale, domain generalization, and the potential for further tuning. Overall, Tina demonstrates that tiny, cost-efficient RL pipelines can approach or exceed the performance of larger models trained with full parameters, with meaningful implications for accessible AI research and reproducibility.

Abstract

How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model. This minimalist approach produces models that achieve reasoning performance which is competitive with, and sometimes surpasses, SOTA RL reasoning models built upon the same base model. Crucially, this is achieved at a tiny fraction of the computational post-training cost employed by existing SOTA models. In fact, the best Tina model achieves a >20\% reasoning performance increase and 43.33\% Pass@1 accuracy on AIME24, at only \$9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA. We validate this across multiple open-source reasoning datasets and various ablation settings starting with a single, fixed set of hyperparameters. Furthermore, we hypothesize that this effectiveness and efficiency stem from LoRA rapidly adapting the model to the structural format of reasoning rewarded by RL, while largely preserving the base model's underlying knowledge. In service of accessibility and open research, we fully open-source all code, training logs, and model weights \& checkpoints.

Tina: Tiny Reasoning Models via LoRA

TL;DR

Tina investigates how to instill robust, multi-step reasoning in language models at minimal cost by applying LoRA-based RL updates to a compact 1.5B base model. The approach trains only low-rank adapters within a GRPO-style RL framework, achieving competitive or superior reasoning performance on six benchmarks (e.g., AIME, AMC, MATH500, GPQA, Minerva) with an estimated post-training cost around $9. The authors provide extensive open-source data, logs, and checkpoints, and report that smaller, high-quality datasets can outperform larger ones, highlighting rapid format adaptation as the key mechanism behind the efficiency gains. This work suggests that efficient RL reasoning can be democratized for broader participation, though it acknowledges limitations in base-model scale, domain generalization, and the potential for further tuning. Overall, Tina demonstrates that tiny, cost-efficient RL pipelines can approach or exceed the performance of larger models trained with full parameters, with meaningful implications for accessible AI research and reproducibility.

Abstract

How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model. This minimalist approach produces models that achieve reasoning performance which is competitive with, and sometimes surpasses, SOTA RL reasoning models built upon the same base model. Crucially, this is achieved at a tiny fraction of the computational post-training cost employed by existing SOTA models. In fact, the best Tina model achieves a >20\% reasoning performance increase and 43.33\% Pass@1 accuracy on AIME24, at only \$9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA. We validate this across multiple open-source reasoning datasets and various ablation settings starting with a single, fixed set of hyperparameters. Furthermore, we hypothesize that this effectiveness and efficiency stem from LoRA rapidly adapting the model to the structural format of reasoning rewarded by RL, while largely preserving the base model's underlying knowledge. In service of accessibility and open research, we fully open-source all code, training logs, and model weights \& checkpoints.

Paper Structure

This paper contains 25 sections, 4 equations, 12 figures, 20 tables.

Figures (12)

  • Figure 1: Overall comparison between Tina and baseline models. The Tina model in the figure corresponds to the best checkpoint in Table \ref{['tab:tina_openrs2_eval']}. Reasoning performance denotes the average score across AIME24/25, AMC23, MATH500, GPQA, and Minerva, as described in Section \ref{['sec:training']}. The calculation of each comparative metric is detailed in Appendix \ref{['sec:cost_breakdown']}.
  • Figure 2: Release timeline of open-source models that aim to replicate the performance of advanced reasoning models like o1(-preview) openai2024openaio1card and R1 deepseekai2025deepseekr1incentivizingreasoningcapability, which we refer to as open-source reasoning replicas.
  • Figure 3: Less is more LoRA-based RL. Approximate training FLOPs vs reasoning performance comparison between Tina and baseline models. The calculation is detailed in Appendix \ref{['sec:cost_breakdown']}.
  • Figure 4: Phase transition in LoRA-based RL. The raw data is from the Weights & Biases training logs and smoothed via exponential moving average (EMA) with factor $0.1$. The "training turning point" in the legend means the step where the format-like metrics (e.g., format reward, completion length) start to destabilize. Refer to Appendix \ref{['app:full_tina_phase_transit']} for the full set of plots.
  • Figure 5: Phase transition in Tina-DeepScaleR-1.5B-Preview and Tina-STILL-3-1.5B-preview. The raw data is from the Weights & Biases training logs and smoothed via exponential moving average (EMA) with factor $0.1$.
  • ...and 7 more figures