Table of Contents
Fetching ...

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Philipp Mondorf, Shijia Zhou, Monica Riedler, Barbara Plank

TL;DR

The results show that a small transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions, suggesting a promising direction toward more robust and generalizable models.

Abstract

Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their knowledge to novel compositional scenarios, revealing notable limitations in systematic generalization. There has been an ongoing debate about whether neural networks possess the capacity for systematic generalization, with recent studies suggesting that meta-learning approaches designed for compositionality can significantly enhance this ability. However, these insights have largely been confined to linguistic problems, leaving their applicability to other tasks an open question. In this study, we extend meta-learning for compositionality to the domain of abstract spatial reasoning. To this end, we introduce $\textit{Compositional-ARC}\unicode{x2014}$a dataset designed to evaluate the capacity of models to systematically generalize from known geometric transformations (e.g., translation, rotation) of abstract two-dimensional objects to novel combinations of these transformations (e.g., translation+rotation). Our results show that a small transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions. Notably, despite having only 5.7M parameters, this model significantly outperforms state-of-the-art LLMs$\unicode{x2014}$including o3-mini, GPT-4o, and Gemini 2.0 Flash, which fail to exhibit similar systematic behavior$\unicode{x2014}$and performs on par with the winning model of the ARC prize 2024, an 8B-parameter LLM trained via test-time training. Our findings highlight the effectiveness of meta-learning in promoting systematicity beyond linguistic tasks, suggesting a promising direction toward more robust and generalizable models.

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

TL;DR

The results show that a small transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions, suggesting a promising direction toward more robust and generalizable models.

Abstract

Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their knowledge to novel compositional scenarios, revealing notable limitations in systematic generalization. There has been an ongoing debate about whether neural networks possess the capacity for systematic generalization, with recent studies suggesting that meta-learning approaches designed for compositionality can significantly enhance this ability. However, these insights have largely been confined to linguistic problems, leaving their applicability to other tasks an open question. In this study, we extend meta-learning for compositionality to the domain of abstract spatial reasoning. To this end, we introduce a dataset designed to evaluate the capacity of models to systematically generalize from known geometric transformations (e.g., translation, rotation) of abstract two-dimensional objects to novel combinations of these transformations (e.g., translation+rotation). Our results show that a small transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions. Notably, despite having only 5.7M parameters, this model significantly outperforms state-of-the-art LLMsincluding o3-mini, GPT-4o, and Gemini 2.0 Flash, which fail to exhibit similar systematic behaviorand performs on par with the winning model of the ARC prize 2024, an 8B-parameter LLM trained via test-time training. Our findings highlight the effectiveness of meta-learning in promoting systematicity beyond linguistic tasks, suggesting a promising direction toward more robust and generalizable models.

Paper Structure

This paper contains 53 sections, 28 equations, 15 figures, 8 tables.

Figures (15)

  • Figure 1: A conceptual overview of the data in Compositional-ARC. Primitive transformations refer to basic geometric transformations (e.g., translation, reflection, extension) based on an object's (a) shape, (b) color, or (c) proximity to a neighboring object. Pairs of these indicators, such as (d) shape+color, (e) shape+neighbor, or (f) color+neighbor, can be combined to form level-1 transformation compositions. Finally, all three indicators can be combined to form level-2 transformation compositions, based on the object's (g) shape+color+neighbor.
  • Figure 2: An example of the few-shot instruction learning task adapted from Lake2023. Study instructions illustrate the mapping of pseudolanguage expressions to abstract symbols.
  • Figure 3: An episode from Compositional-ARC. Given a set of study examples with primitive transformations and level-1 transformation compositions, models must predict the output grid for an unseen level-2 transformation composition. Visual grammar: shape $\rightarrow$ clockwise rotation, color $\rightarrow$ translation to right, neighbor $\rightarrow$ leftward extension. Model predictions are presented to the right.
  • Figure 4: Error distribution by error category across models. Bars show the fraction of prediction errors assigned to each error category.
  • Figure 5: An example of the few-shot learning task. Models are provided with three study examples that demonstrate the transformation that needs to be inferred for the final input grid. Model predictions are displayed to the right.
  • ...and 10 more figures

Theorems & Definitions (7)

  • definition 1: Systematic generalization
  • definition 2: Grid & Object
  • definition 3: Translation
  • definition 4: Rotation
  • definition 5: Reflection
  • definition 6: Extension
  • definition 7: Color Change