Table of Contents
Fetching ...

Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge

John Chong Min Tan, Mehul Motani

TL;DR

This work reframes the Abstraction and Reasoning Corpus (ARC) challenge as a problem solvable by a system of multiple expert LLM agents grounded in several fixed abstraction spaces. By coupling GPT-4 with a DSL-like set of primitive functions, a JSON-based chain-of-thought format, and iterative environmental feedback, the approach synthesizes input-output mappings into executable programs. Empirically, the method solves 50 of 111 training tasks (about 45%), with breakdowns showing utility from Object and Pixel views and benefits from iterative feedback, while highlighting limitations in coverage and priors. The study demonstrates the viability of combining multi-view abstractions, grounded action spaces, and adaptive prompting to tackle ARC-like reasoning, and points to future gains from expanding views, memory, and automatic primitive-function discovery.

Abstract

We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs) as a system of multiple expert agents. Using the flexibility of LLMs to be prompted to do various novel tasks using zero-shot, few-shot, context-grounded prompting, we explore the feasibility of using LLMs to solve the ARC Challenge. We firstly convert the input image into multiple suitable text-based abstraction spaces. We then utilise the associative power of LLMs to derive the input-output relationship and map this to actions in the form of a working program, similar to Voyager / Ghost in the MineCraft. In addition, we use iterative environmental feedback in order to guide LLMs to solve the task. Our proposed approach achieves 50 solves out of 111 training set problems (45%) with just three abstraction spaces - grid, object and pixel - and we believe that with more abstraction spaces and learnable actions, we will be able to solve more.

Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge

TL;DR

This work reframes the Abstraction and Reasoning Corpus (ARC) challenge as a problem solvable by a system of multiple expert LLM agents grounded in several fixed abstraction spaces. By coupling GPT-4 with a DSL-like set of primitive functions, a JSON-based chain-of-thought format, and iterative environmental feedback, the approach synthesizes input-output mappings into executable programs. Empirically, the method solves 50 of 111 training tasks (about 45%), with breakdowns showing utility from Object and Pixel views and benefits from iterative feedback, while highlighting limitations in coverage and priors. The study demonstrates the viability of combining multi-view abstractions, grounded action spaces, and adaptive prompting to tackle ARC-like reasoning, and points to future gains from expanding views, memory, and automatic primitive-function discovery.

Abstract

We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs) as a system of multiple expert agents. Using the flexibility of LLMs to be prompted to do various novel tasks using zero-shot, few-shot, context-grounded prompting, we explore the feasibility of using LLMs to solve the ARC Challenge. We firstly convert the input image into multiple suitable text-based abstraction spaces. We then utilise the associative power of LLMs to derive the input-output relationship and map this to actions in the form of a working program, similar to Voyager / Ghost in the MineCraft. In addition, we use iterative environmental feedback in order to guide LLMs to solve the task. Our proposed approach achieves 50 solves out of 111 training set problems (45%) with just three abstraction spaces - grid, object and pixel - and we believe that with more abstraction spaces and learnable actions, we will be able to solve more.
Paper Structure (29 sections, 14 figures, 4 tables)

This paper contains 29 sections, 14 figures, 4 tables.

Figures (14)

  • Figure 1: A sample ARC task. The challenge is to infer the abstract rule(s) governing the demonstration transformations and apply it to the test input. Example from: https://aiguide.substack.com/p/why-the-abstraction-and-reasoning
  • Figure 2: 88% of ARC tasks can be solved by the Builder from just the description alone given by the Describer, without input-output examples. Can GPT-4 function as both the describer and the builder? Image reproduced from Fig. 4 of acquaviva2021communicating.
  • Figure 3: Process Flowchart of LLMs as a System to solve the ARC Challenge.
  • Figure 4: The overall Mass Sampling and Filtering process with various expert agents
  • Figure 5: A sample grid for an ARC Challenge Task - taken from Task Demonstration 1 of ARC Training Set d037b0a7
  • ...and 9 more figures