The role of positional encodings in the ARC benchmark
Guilherme H. Bandeira Costa, Miguel Freire, Arlindo L. Oliveira
TL;DR
The paper investigates how positional encodings shape Transformer-based reasoning in the ARC benchmark, showing that standard encodings can hinder spatial reasoning on grid tasks. Through a mix of vanilla and custom transformer experiments, it demonstrates that 2D positional encoding robustly improves data-efficient ARC performance, while RoPE can gain advantages when data is plentiful. The findings suggest that tailoring positional encodings to spatial structure enables more efficient learning with smaller models, offering practical benefits for data-scarce reasoning tasks. Limitations include treating ARC examples independently, motivating future work on group-based reasoning and expanded ARC task sets.
Abstract
The Abstraction and Reasoning Corpus challenges AI systems to perform abstract reasoning with minimal training data, a task intuitive for humans but demanding for machine learning models. Using CodeT5+ as a case study, we demonstrate how limitations in positional encoding hinder reasoning and impact performance. This work further examines the role of positional encoding across transformer architectures, highlighting its critical influence on models of varying sizes and configurations. Comparing several strategies, we find that while 2D positional encoding and Rotary Position Embedding offer competitive performance, 2D encoding excels in data-constrained scenarios, emphasizing its effectiveness for ARC tasks
