Table of Contents
Fetching ...

IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction

Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, Zhiyu Zoey Chen

TL;DR

This paper defines RULEARN, the first interactive benchmark to evaluate LLM agents’ holistic rule-learning in dynamic environments, where rules are hidden and must be inferred through induced hypotheses and tested via experiments. It introduces the IDEA framework that tightly couples abductive hypothesis generation, deductive planning, and inductive refinement to enable human-like rule learning. Across five LLMs and human participants, IDEA improves rule-learning performance and exploration efficiency, though humans still outperform current models, highlighting gaps in hypothesis refinement and adaptive planning. The work provides a challenging resource for advancing LLM agents toward more human-like rule learning in real-world, interactive settings and includes code and data to support open research.

Abstract

While large language models (LLMs) have been thoroughly evaluated for deductive and inductive reasoning, their proficiency in holistic rule learning in interactive environments remains less explored. We introduce RULEARN, a novel benchmark to assess the rule-learning abilities of LLM agents in interactive settings. In RULEARN, agents strategically interact with simulated environments to gather observations, discern patterns, and solve complex problems. To enhance the rule-learning capabilities for LLM agents, we propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction. The IDEA agent generates initial hypotheses from limited observations through abduction, devises plans to validate these hypotheses or leverages them to solve problems via deduction, and refines previous hypotheses through induction, dynamically establishing and applying rules that mimic human rule-learning behaviors. Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline. Furthermore, our study with human participants reveals notable discrepancies in rule-learning behaviors between humans and LLMs. We believe our benchmark will serve as a valuable and challenging resource, and IDEA will provide crucial insights for the development of LLM agents capable of human-like rule learning in real-world scenarios. Our code and data is publicly available.

IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction

TL;DR

This paper defines RULEARN, the first interactive benchmark to evaluate LLM agents’ holistic rule-learning in dynamic environments, where rules are hidden and must be inferred through induced hypotheses and tested via experiments. It introduces the IDEA framework that tightly couples abductive hypothesis generation, deductive planning, and inductive refinement to enable human-like rule learning. Across five LLMs and human participants, IDEA improves rule-learning performance and exploration efficiency, though humans still outperform current models, highlighting gaps in hypothesis refinement and adaptive planning. The work provides a challenging resource for advancing LLM agents toward more human-like rule learning in real-world, interactive settings and includes code and data to support open research.

Abstract

While large language models (LLMs) have been thoroughly evaluated for deductive and inductive reasoning, their proficiency in holistic rule learning in interactive environments remains less explored. We introduce RULEARN, a novel benchmark to assess the rule-learning abilities of LLM agents in interactive settings. In RULEARN, agents strategically interact with simulated environments to gather observations, discern patterns, and solve complex problems. To enhance the rule-learning capabilities for LLM agents, we propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction. The IDEA agent generates initial hypotheses from limited observations through abduction, devises plans to validate these hypotheses or leverages them to solve problems via deduction, and refines previous hypotheses through induction, dynamically establishing and applying rules that mimic human rule-learning behaviors. Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline. Furthermore, our study with human participants reveals notable discrepancies in rule-learning behaviors between humans and LLMs. We believe our benchmark will serve as a valuable and challenging resource, and IDEA will provide crucial insights for the development of LLM agents capable of human-like rule learning in real-world scenarios. Our code and data is publicly available.
Paper Structure (30 sections, 27 figures, 9 tables, 1 algorithm)

This paper contains 30 sections, 27 figures, 9 tables, 1 algorithm.

Figures (27)

  • Figure 1: The reasoning cycle of rule learning encompasses abduction, deduction, and induction.
  • Figure 2: A simplified puzzle in the RULEARN benchmark and the IDEA agent's workflow (in real puzzles, agents have fewer initial observations and more complex rules). The agent generates an initial hypothesis through abduction, develops an exploration plan via deduction, and refines its hypothesis using induction. For example, the IDEA agent first hypothesizes that the password is the number of the blue paintings, tests this by entering code 003, and adjusts its strategy based on the feedback.
  • Figure 3: An example of the IDEA agent solving a Reactor puzzle. At each step, the agent must choose whether to interact with the environment or adjust its hypothesis and plan based on current observations. If observed facts contradict the existing hypothesis, the agent is expected to refine its hypothesis. The refined hypothesis and plan will then guide subsequent exploration.
  • Figure 4: Comparison of the cumulative number of puzzles solved at each interaction step. The IDEA agent significantly decrease number of steps needed to solve a puzzle compared to the Baseline agent.
  • Figure 5: Human Evaluation Results. Bars represent measured values per model and puzzle type; the absence of a bar indicates zero or unavailable data. Plot (a): Abduction Correct Rate—the frequency of correctly guessing the rule during abduction. Plot (b): Effective Deduction Rate—the rate at which deduction plans effectively validate hypotheses or solve puzzles. Plot (c): Effective Induction Rate—the proportion of inductions where the refined hypothesis improved over the previous one. Plot (d): Average Actions per Effective Induction—the average number of interactive actions needed for an effective induction.
  • ...and 22 more figures