Table of Contents
Fetching ...

Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks

Niccolò Grillo, Andrea Toccaceli, Joël Mathys, Benjamin Estermann, Stefania Fresca, Roger Wattenhofer

TL;DR

This work tackles extrapolative reasoning by modeling logic puzzles as graphs and solving them with graph neural networks in a multi-agent reinforcement learning framework. It introduces a graph-based PUZZLES evaluation setup and systematically compares GNNs against Transformer baselines, as well as recurrent versus state-less variants and different reward schemes. The findings show that explicit relational inductive biases in GNNs enhance both in-distribution performance and out-of-distribution generalization to larger puzzle sizes, while reward design and recurrence impact extrapolation differently across difficulty. Overall, the study provides a principled, graph-centric approach to scalable, generalizable reasoning and offers insights into designing learning-based systems that extrapolate beyond interpolation.

Abstract

Despite incredible progress, many neural architectures fail to properly generalize beyond their training distribution. As such, learning to reason in a correct and generalizable way is one of the current fundamental challenges in machine learning. In this respect, logic puzzles provide a great testbed, as we can fully understand and control the learning environment. Thus, they allow to evaluate performance on previously unseen, larger and more difficult puzzles that follow the same underlying rules. Since traditional approaches often struggle to represent such scalable logical structures, we propose to model these puzzles using a graph-based approach. Then, we investigate the key factors enabling the proposed models to learn generalizable solutions in a reinforcement learning setting. Our study focuses on the impact of the inductive bias of the architecture, different reward systems and the role of recurrent modeling in enabling sequential reasoning. Through extensive experiments, we demonstrate how these elements contribute to successful extrapolation on increasingly complex puzzles.These insights and frameworks offer a systematic way to design learning-based systems capable of generalizable reasoning beyond interpolation.

Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks

TL;DR

This work tackles extrapolative reasoning by modeling logic puzzles as graphs and solving them with graph neural networks in a multi-agent reinforcement learning framework. It introduces a graph-based PUZZLES evaluation setup and systematically compares GNNs against Transformer baselines, as well as recurrent versus state-less variants and different reward schemes. The findings show that explicit relational inductive biases in GNNs enhance both in-distribution performance and out-of-distribution generalization to larger puzzle sizes, while reward design and recurrence impact extrapolation differently across difficulty. Overall, the study provides a principled, graph-centric approach to scalable, generalizable reasoning and offers insights into designing learning-based systems that extrapolate beyond interpolation.

Abstract

Despite incredible progress, many neural architectures fail to properly generalize beyond their training distribution. As such, learning to reason in a correct and generalizable way is one of the current fundamental challenges in machine learning. In this respect, logic puzzles provide a great testbed, as we can fully understand and control the learning environment. Thus, they allow to evaluate performance on previously unseen, larger and more difficult puzzles that follow the same underlying rules. Since traditional approaches often struggle to represent such scalable logical structures, we propose to model these puzzles using a graph-based approach. Then, we investigate the key factors enabling the proposed models to learn generalizable solutions in a reinforcement learning setting. Our study focuses on the impact of the inductive bias of the architecture, different reward systems and the role of recurrent modeling in enabling sequential reasoning. Through extensive experiments, we demonstrate how these elements contribute to successful extrapolation on increasingly complex puzzles.These insights and frameworks offer a systematic way to design learning-based systems capable of generalizable reasoning beyond interpolation.

Paper Structure

This paper contains 30 sections, 8 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: We focus on logic puzzles of varying sizes in order to systematically evaluate the ability of neural architectures to extrapolate beyond the seen training data. By modelling the problem instances through a unifying graph framework, we can naturally encompass and evaluate on instances where generalization capabilities are required.
  • Figure 2: Some Example puzzles of the PUZZLES library, inspired by the collection of Simon Tatham.
  • Figure 3: We provide a new graph interface in order to ease the testing for size generalization on six puzzles. From top left to bottom right: Light Up, Loopy, Mosaic, Net, Tents and Unruly.
  • Figure 4: Illustration of the modeling of the puzzle Loopy. In this case, each decision-node (black or blue circles) corresponds to an edge of the original game grid. Each face of the game grid is represented as a meta-node (red circle), which is connected to its four adjacent decision-nodes. The nodes and edges of the graph have features, which determine the current state. The next state is determined by the collective actions of all decision-nodes. The local graph representation remains the same across all puzzle sizes.
  • Figure 5: Illustration of testing the ability to generalize beyond the training distribution for the puzzle Net. While models only see small puzzle instances during training, the rules and logic that govern the puzzle remain the same. Therefore, during evaluation, the model is tested on puzzles that are up to 16x larger.
  • ...and 13 more figures