Table of Contents
Fetching ...

ARC-AGI-2 Technical Report

Wallyson Lemes de Oliveira, Mekhron Bobokhonov, Matteo Caorsi, Aldo Podestà, Gabriele Beltramo, Luca Crosato, Matteo Bonotto, Federica Cecchetto, Hadrien Espic, Dan Titus Salajan, Stefan Taga, Luca Pana, Joe Carthy

TL;DR

This work presents a transformer-based system that advances ARC performance by combining neural inference with structure-aware priors and online task adaptation, and achieves a significant improvement over transformer baselines and surpasses prior neural ARC solvers, closing the gap toward human-level generalization.

Abstract

The Abstraction and Reasoning Corpus (ARC) is designed to assess generalization beyond pattern matching, requiring models to infer symbolic rules from very few examples. In this work, we present a transformer-based system that advances ARC performance by combining neural inference with structure-aware priors and online task adaptation. Our approach is built on four key ideas. First, we reformulate ARC reasoning as a sequence modeling problem using a compact task encoding with only 125 tokens, enabling efficient long-context processing with a modified LongT5 architecture. Second, we introduce a principled augmentation framework based on group symmetries, grid traversals, and automata perturbations, enforcing invariance to representation changes. Third, we apply test-time training (TTT) with lightweight LoRA adaptation, allowing the model to specialize to each unseen task by learning its transformation logic from demonstrations. Fourth, we design a symmetry-aware decoding and scoring pipeline that aggregates likelihoods across augmented task views, effectively performing ``multi-perspective reasoning'' over candidate solutions. We demonstrate that these components work synergistically: augmentations expand hypothesis space, TTT sharpens local reasoning, and symmetry-based scoring improves solution consistency. Our final system achieves a significant improvement over transformer baselines and surpasses prior neural ARC solvers, closing the gap toward human-level generalization.

ARC-AGI-2 Technical Report

TL;DR

This work presents a transformer-based system that advances ARC performance by combining neural inference with structure-aware priors and online task adaptation, and achieves a significant improvement over transformer baselines and surpasses prior neural ARC solvers, closing the gap toward human-level generalization.

Abstract

The Abstraction and Reasoning Corpus (ARC) is designed to assess generalization beyond pattern matching, requiring models to infer symbolic rules from very few examples. In this work, we present a transformer-based system that advances ARC performance by combining neural inference with structure-aware priors and online task adaptation. Our approach is built on four key ideas. First, we reformulate ARC reasoning as a sequence modeling problem using a compact task encoding with only 125 tokens, enabling efficient long-context processing with a modified LongT5 architecture. Second, we introduce a principled augmentation framework based on group symmetries, grid traversals, and automata perturbations, enforcing invariance to representation changes. Third, we apply test-time training (TTT) with lightweight LoRA adaptation, allowing the model to specialize to each unseen task by learning its transformation logic from demonstrations. Fourth, we design a symmetry-aware decoding and scoring pipeline that aggregates likelihoods across augmented task views, effectively performing ``multi-perspective reasoning'' over candidate solutions. We demonstrate that these components work synergistically: augmentations expand hypothesis space, TTT sharpens local reasoning, and symmetry-based scoring improves solution consistency. Our final system achieves a significant improvement over transformer baselines and surpasses prior neural ARC solvers, closing the gap toward human-level generalization.
Paper Structure (142 sections, 43 equations, 20 figures, 7 tables, 4 algorithms)

This paper contains 142 sections, 43 equations, 20 figures, 7 tables, 4 algorithms.

Figures (20)

  • Figure 1: A task where the implicit goal is to count unique objects and select the object that appears the most times (the actual task has more demonstration pairs than these three).
  • Figure 2: Overview of the ARC-AGI pipeline. The system consists of setup, training, and inference stages. Inference integrates Test-Time Training (TTT), decoding and scoring components. The user input, in the case of ARC-AGI Kaggle environment, corresponds to loading the test set from files.
  • Figure 3: This is an example of a task made by our team: the logic is similar but not identical to any task in the ARC-AGI public dataset.
  • Figure 4: An example of a tokenization artifact.
  • Figure 5: Examples computer-vision-like augmentations applied to the first input grid of task 025d127b; \ref{['fig:cvlike_aug_1']} original grid; \ref{['fig:cvlike_aug_2']} upscale 2x, directions=$both$; \ref{['fig:cvlike_aug_3']} adding frame; \ref{['fig:cvlike_aug_4']} adding metagrid, $s=1$, directions=$both$.
  • ...and 15 more figures