Table of Contents
Fetching ...

Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

Jihwan Lee, Woochang Sim, Sejin Kim, Sundong Kim

TL;DR

This work addresses the challenge of analogical reasoning in ARC by evaluating model-based RL against model-free baselines. Using DreamerV3, a latent-world-model agent, and PPO on restricted ARC tasks, the study demonstrates that internal models enable more data-efficient learning and better generalization to similar tasks, including effective adaptation from pre-trained tasks. Notably, DreamerV3 shows strong performance on $3 \times 3$ diagonal flips and benefits from high-quality pre-training, though learning dynamics include interim drops that may reflect conceptual consolidation. The findings suggest that internal world models enhance analogical reasoning in structured environments and point to promising directions in meta-learning and transfer learning for robust generalization to untrained ARC tasks.

Abstract

This paper demonstrates that model-based reinforcement learning (model-based RL) is a suitable approach for the task of analogical reasoning. We hypothesize that model-based RL can solve analogical reasoning tasks more efficiently through the creation of internal models. To test this, we compared DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method, on the Abstraction and Reasoning Corpus (ARC) tasks. Our results indicate that model-based RL not only outperforms model-free RL in learning and generalizing from single tasks but also shows significant advantages in reasoning across similar tasks.

Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

TL;DR

This work addresses the challenge of analogical reasoning in ARC by evaluating model-based RL against model-free baselines. Using DreamerV3, a latent-world-model agent, and PPO on restricted ARC tasks, the study demonstrates that internal models enable more data-efficient learning and better generalization to similar tasks, including effective adaptation from pre-trained tasks. Notably, DreamerV3 shows strong performance on diagonal flips and benefits from high-quality pre-training, though learning dynamics include interim drops that may reflect conceptual consolidation. The findings suggest that internal world models enhance analogical reasoning in structured environments and point to promising directions in meta-learning and transfer learning for robust generalization to untrained ARC tasks.

Abstract

This paper demonstrates that model-based reinforcement learning (model-based RL) is a suitable approach for the task of analogical reasoning. We hypothesize that model-based RL can solve analogical reasoning tasks more efficiently through the creation of internal models. To test this, we compared DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method, on the Abstraction and Reasoning Corpus (ARC) tasks. Our results indicate that model-based RL not only outperforms model-free RL in learning and generalizing from single tasks but also shows significant advantages in reasoning across similar tasks.
Paper Structure (20 sections, 3 figures)

This paper contains 20 sections, 3 figures.

Figures (3)

  • Figure 1: Each ARC task includes several demos and a test input. The objective is to identify the grid corresponding to the test input by applying a common transformation rule found across all demos. The term "CCW" in the third task means counterclockwise.
  • Figure 2: Performance of agents on four single ARC tasks with different RL algorithms. The above two results show that the model-based RL agent learned better on analogical reasoning tasks. The below two results show that the model-free RL agent could be better on simple tasks. Additionally, an interesting common result was shown in the learning curve of model-based RL: there always occurs an interval in the middle of learning where accuracy drops to 0. We argue that this interval is where model-based RL learns concepts for analogical reasoning.
  • Figure 3: Comparing the performance of agents between DreamerV3 and PPO on two single ARC tasks with a pre-trained model about a similar task. Understandably, PPO did not gain any benefit from fine-tuning due to the low performance of the pre-trained model. In contrast, DreamerV3 showed very high performance when adapting from a pre-trained model that had performed well in the $3 \times 3$ Diagonal Flip task. However, when utilizing a pre-trained model with poor performance, DreamerV3 also displayed lower initial learning efficiency than when no pre-trained model was used. Lastly, at the end of the experiment where fine-tuning was successful, a sudden drop in performance occurred, which is presumed to be the same phenomenon as the interval in previous experiments.