Table of Contents
Fetching ...

ARC-AGI Without Pretraining

Isaac Liao, Albert Gu

TL;DR

CompressARC demonstrates that extremely data-efficient intelligence is possible by performing MDL-driven inference with a 76K-parameter, no-pretraining network. It reframes ARC-AGI puzzle solving as a seed-based program compression problem, using a differentiable search to minimize program length. The results show 20% evaluation puzzle solves and 34.75% on training puzzles under limited compute, highlighting a viable alternative route to AGI beyond pretraining, with discussion of limitations and future improvements.

Abstract

Conventional wisdom in the age of LLMs dictates that solving IQ-test-like visual puzzles from the ARC-AGI-1 benchmark requires capabilities derived from massive pretraining. To counter this, we introduce CompressARC, a 76K parameter model without any pretraining that solves 20% of evaluation puzzles by minimizing the description length (MDL) of the target puzzle purely during inference time. The MDL endows CompressARC with extreme generalization abilities typically unheard of in deep learning. To our knowledge, CompressARC is the only deep learning method for ARC-AGI where training happens only on a single sample: the target inference puzzle itself, with the final solution information removed. Moreover, CompressARC does not train on the pre-provided ARC-AGI "training set". Under these extremely data-limited conditions, we do not ordinarily expect any puzzles to be solvable at all. Yet CompressARC still solves a diverse distribution of creative ARC-AGI puzzles, suggesting MDL to be an alternative feasible way to produce intelligence, besides conventional pretraining.

ARC-AGI Without Pretraining

TL;DR

CompressARC demonstrates that extremely data-efficient intelligence is possible by performing MDL-driven inference with a 76K-parameter, no-pretraining network. It reframes ARC-AGI puzzle solving as a seed-based program compression problem, using a differentiable search to minimize program length. The results show 20% evaluation puzzle solves and 34.75% on training puzzles under limited compute, highlighting a viable alternative route to AGI beyond pretraining, with discussion of limitations and future improvements.

Abstract

Conventional wisdom in the age of LLMs dictates that solving IQ-test-like visual puzzles from the ARC-AGI-1 benchmark requires capabilities derived from massive pretraining. To counter this, we introduce CompressARC, a 76K parameter model without any pretraining that solves 20% of evaluation puzzles by minimizing the description length (MDL) of the target puzzle purely during inference time. The MDL endows CompressARC with extreme generalization abilities typically unheard of in deep learning. To our knowledge, CompressARC is the only deep learning method for ARC-AGI where training happens only on a single sample: the target inference puzzle itself, with the final solution information removed. Moreover, CompressARC does not train on the pre-provided ARC-AGI "training set". Under these extremely data-limited conditions, we do not ordinarily expect any puzzles to be solvable at all. Yet CompressARC still solves a diverse distribution of creative ARC-AGI puzzles, suggesting MDL to be an alternative feasible way to produce intelligence, besides conventional pretraining.

Paper Structure

This paper contains 53 sections, 12 equations, 18 figures, 6 tables, 3 algorithms.

Figures (18)

  • Figure 1: Three example ARC-AGI-1 puzzles.
  • Figure 2: CompressARC approximates a specific compression algorithm that converts the ARC-AGI puzzle dataset into the shortest program that prints it out exactly, along with any solutions. These printed solutions are assumed to be good predictors of the actual solutions, according to Occam's razor.
  • Figure 3: Core structure of CompressARC's neural network, which operates on multitensor data. Individual operations (colored) read and write to a residual backbone through learned projections (grey) in the $\text{channel}$ dimension. The network is equivariant to permutations of indices along the other, non-$\text{channel}$ dimensions as a result. Some layers like cummax break certain geometric symmetries, giving the architecture specific geometric abilities listed in Appendix \ref{['sec:abilities']}. Normalization, softmax, shift, and directional layers are not shown.
  • Figure 4: CompressARC's puzzle solve accuracy rises as inference time learning progresses. Various numbers of allowed solution guesses (pass@n) for accuracy measurement are shown. The official benchmark is reported with 2 allowed guesses, which is why we report 20% on the evaluation set.
  • Figure 5: Color the Boxes, puzzle 272f95fa.
  • ...and 13 more figures