Test-time Adaptation of Tiny Recursive Models
Ronan Killian McGovern
TL;DR
The paper tackles the problem of adapting abstract-reasoning models to ARC AGI II tasks under strict compute limits. It shows that starting from a pre-trained Tiny Recursive Model (TRM) and applying full fine-tuning yields meaningful post-training gains (up to 6.67% on semi-private tasks), outperforming embedding-only approaches and LoRA in many runs. Pre-training from scratch remains infeasible within the competition budget, while pre-trained TRMs can be efficiently adapted, though results exhibit substantial stochastic variability. The work also investigates data diversity, augmentation strategies, and latent-program-space perspectives (SLPS), highlighting the trade-offs and open questions in pre-training design for robust post-training generalization.
Abstract
Prior to the close of the 2025 ARC Prize competition, the leading open source approach - known as TRM, or Tiny Recursive Models - involved training a 7M parameter recursive neural network on augmented variants of ARC tasks. That approach scored approximately 7.8% on the public ARC AGI II evaluation set, but required a level of compute far in excess of what is allowed during the competition. This paper shows that, by starting from a tiny recursive model that has been pre-trained on public ARC tasks, one can efficiently fine-tune on competition tasks within the allowed compute limits. Specifically, a model was pre-trained on 1,280 public tasks for 700k+ optimizer steps over 48 hours on 4xH100 SXM GPUs to obtain a ~10% score on the public evaluation set. That model was then post-trained in just 12,500 gradient steps during the competition to reach a score of 6.67% on semi-private evaluation tasks. Notably, such post-training performance is achieved by full-fine tuning of the tiny model, not LoRA fine-tuning or fine-tuning of task embeddings alone.
