Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge
John Chong Min Tan, Mehul Motani
TL;DR
This work reframes the Abstraction and Reasoning Corpus (ARC) challenge as a problem solvable by a system of multiple expert LLM agents grounded in several fixed abstraction spaces. By coupling GPT-4 with a DSL-like set of primitive functions, a JSON-based chain-of-thought format, and iterative environmental feedback, the approach synthesizes input-output mappings into executable programs. Empirically, the method solves 50 of 111 training tasks (about 45%), with breakdowns showing utility from Object and Pixel views and benefits from iterative feedback, while highlighting limitations in coverage and priors. The study demonstrates the viability of combining multi-view abstractions, grounded action spaces, and adaptive prompting to tackle ARC-like reasoning, and points to future gains from expanding views, memory, and automatic primitive-function discovery.
Abstract
We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs) as a system of multiple expert agents. Using the flexibility of LLMs to be prompted to do various novel tasks using zero-shot, few-shot, context-grounded prompting, we explore the feasibility of using LLMs to solve the ARC Challenge. We firstly convert the input image into multiple suitable text-based abstraction spaces. We then utilise the associative power of LLMs to derive the input-output relationship and map this to actions in the form of a working program, similar to Voyager / Ghost in the MineCraft. In addition, we use iterative environmental feedback in order to guide LLMs to solve the task. Our proposed approach achieves 50 solves out of 111 training set problems (45%) with just three abstraction spaces - grid, object and pixel - and we believe that with more abstraction spaces and learnable actions, we will be able to solve more.
