Table of Contents
Fetching ...

The role of positional encodings in the ARC benchmark

Guilherme H. Bandeira Costa, Miguel Freire, Arlindo L. Oliveira

TL;DR

The paper investigates how positional encodings shape Transformer-based reasoning in the ARC benchmark, showing that standard encodings can hinder spatial reasoning on grid tasks. Through a mix of vanilla and custom transformer experiments, it demonstrates that 2D positional encoding robustly improves data-efficient ARC performance, while RoPE can gain advantages when data is plentiful. The findings suggest that tailoring positional encodings to spatial structure enables more efficient learning with smaller models, offering practical benefits for data-scarce reasoning tasks. Limitations include treating ARC examples independently, motivating future work on group-based reasoning and expanded ARC task sets.

Abstract

The Abstraction and Reasoning Corpus challenges AI systems to perform abstract reasoning with minimal training data, a task intuitive for humans but demanding for machine learning models. Using CodeT5+ as a case study, we demonstrate how limitations in positional encoding hinder reasoning and impact performance. This work further examines the role of positional encoding across transformer architectures, highlighting its critical influence on models of varying sizes and configurations. Comparing several strategies, we find that while 2D positional encoding and Rotary Position Embedding offer competitive performance, 2D encoding excels in data-constrained scenarios, emphasizing its effectiveness for ARC tasks

The role of positional encodings in the ARC benchmark

TL;DR

The paper investigates how positional encodings shape Transformer-based reasoning in the ARC benchmark, showing that standard encodings can hinder spatial reasoning on grid tasks. Through a mix of vanilla and custom transformer experiments, it demonstrates that 2D positional encoding robustly improves data-efficient ARC performance, while RoPE can gain advantages when data is plentiful. The findings suggest that tailoring positional encodings to spatial structure enables more efficient learning with smaller models, offering practical benefits for data-scarce reasoning tasks. Limitations include treating ARC examples independently, motivating future work on group-based reasoning and expanded ARC task sets.

Abstract

The Abstraction and Reasoning Corpus challenges AI systems to perform abstract reasoning with minimal training data, a task intuitive for humans but demanding for machine learning models. Using CodeT5+ as a case study, we demonstrate how limitations in positional encoding hinder reasoning and impact performance. This work further examines the role of positional encoding across transformer architectures, highlighting its critical influence on models of varying sizes and configurations. Comparing several strategies, we find that while 2D positional encoding and Rotary Position Embedding offer competitive performance, 2D encoding excels in data-constrained scenarios, emphasizing its effectiveness for ARC tasks

Paper Structure

This paper contains 12 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Example of an ARC task. All images that include both input and output are demonstration examples, the image that only includes input is the test example. In this particular task, the rule is that the output grid corresponds to representing the most common object, differentiated by different colors, in the input grid.
  • Figure 2: The selected task is derived from the original ARC dataset where the objective is to connect rows where the pixels at both ends share the same color. Importantly, if all examples are rotated by 90 degrees, the goal of the task remains unchanged, except that the connections would then occur across the columns instead of the rows.
  • Figure 3: Average correct matches across 3 distinct batch sizes when trained and tested for horizontal (pink) versus vertical (green) examples using the default CodeT5+ model.
  • Figure 4: Visual illustration of token creation for vertical (top) and horizontal (bottom) examples. Tokens are generated linearly, with the "goal tokens" (representing the red color) appearing closer together in horizontal arrangements and farther apart in vertical ones.
  • Figure 5: Average correct matches across 3 distinct batch sizes when trained and tested for vertical examples with the original PE (green) versus an altered PE which prioritizes relevant tokens (yellow) using the default CodeT5+ model.
  • ...and 3 more figures