Table of Contents
Fetching ...

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, Wei-Long Zheng, Zenna Tavares, Yewen Pu, Kevin Ellis

TL;DR

The paper investigates whether induction or transduction better supports few-shot abstract reasoning on ARC, revealing that these two paradigms are highly complementary. By building a large synthetic data pipeline from seed Python programs and remixing them with LLMs, the authors train both induction (latent function synthesis) and transduction (direct prediction) models and ensemble them to achieve near-human ARC performance. They show induction excels at symbolic, compositional tasks while transduction handles perceptual, pattern-based tasks, and that ensemble methods yield strong results, including improvements via test-time training and reranking. The findings suggest robust, sample-efficient generalization emerges from integrating neural program search with neural prediction, with implications extending beyond ARC toward hybrid representations and domain libraries rather than restricted languages.

Abstract

When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e.g. using a neural network? We study this question on ARC by training neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input). We train on synthetically generated variations of Python programs that solve ARC training tasks. We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture: Inductive program synthesis excels at precise computations, and at composing multiple concepts, while transduction succeeds on fuzzier perceptual concepts. Ensembling them approaches human-level performance on ARC.

Combining Induction and Transduction for Abstract Reasoning

TL;DR

The paper investigates whether induction or transduction better supports few-shot abstract reasoning on ARC, revealing that these two paradigms are highly complementary. By building a large synthetic data pipeline from seed Python programs and remixing them with LLMs, the authors train both induction (latent function synthesis) and transduction (direct prediction) models and ensemble them to achieve near-human ARC performance. They show induction excels at symbolic, compositional tasks while transduction handles perceptual, pattern-based tasks, and that ensemble methods yield strong results, including improvements via test-time training and reranking. The findings suggest robust, sample-efficient generalization emerges from integrating neural program search with neural prediction, with implications extending beyond ARC toward hybrid representations and domain libraries rather than restricted languages.

Abstract

When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e.g. using a neural network? We study this question on ARC by training neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input). We train on synthetically generated variations of Python programs that solve ARC training tasks. We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture: Inductive program synthesis excels at precise computations, and at composing multiple concepts, while transduction succeeds on fuzzier perceptual concepts. Ensembling them approaches human-level performance on ARC.

Paper Structure

This paper contains 56 sections, 5 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Few-shot learning tasks from the Abstraction and Reasoning Corpus (ARC). Each task typically has 2-5 input-output examples. Here we show just one input-output example per task.
  • Figure 2: Induction generates an intermediate function $f$ to explain training input-outputs. Transduction directly predicts the test output, for example using a neural network.
  • Figure 3: Synthetic data generation pipeline, starting with human-written programs (seeds).
  • Figure 4: Example synthetic ARC problems generated by our pipeline. Concepts are generated in a comment near the top of the Python script as part of the natural language description of the seed.
  • Figure 5: (A) Induction and transduction solve different problems, where solve means predicting the right output given 2 tries. Venn diagram for models trained on 100k synthetic problems generated using gpt4o-mini. (B) Training many models with different random seeds, and then measuring the correlation between solved tasks by different models. Solved tasks strongly correlates with other models of the same class but not the other class. (C) Statistical significance test evaluating the null hypothesis that correlation is independent of whether a model is inductive/transductive.
  • ...and 9 more figures