Hypothesis Generation via LLM-Automated Language Bias for ILP
Yang Yang, Jiemin Wu, Yutao Yue
TL;DR
This work tackles the dependence of inductive logic programming (ILP) on expert-crafted language bias by introducing a multi-agent LLM framework that automatically constructs a predicate system from raw text. The predicate system then guides symbolic knowledge encoding into Prolog facts, after which an ILP solver using the Minimum Description Length (MDL) objective via MAXSYNTH induces globally coherent Horn-rule sets; the MDL cost is given by $cost = ext{program size} + ext{#FP} + ext{#FN}$. Evaluations on SHOES and ZENDO across multiple LLM backends demonstrate superior accuracy and robustness relative to HypoGeniC and Iterative Hypothesis Refinement, particularly in relationally complex tasks, while maintaining model-agnostic stability. The results support the approach as a practical, explainable path to open-domain hypothesis generation that reduces manual bias engineering, with limitations noted for extension to richer real-world text and semantics.
Abstract
Inductive Logic Programming (ILP) is a principled approach for generalizing regularities from data and constructing hypotheses as interpretable logic programs. However, a key limitation is its reliance on expert-crafted language bias - the predicate inventory, types, and mode declarations that delimit the search space. We propose hypothesis generation via LLM-automated language bias: multi-agent LLMs design the bias from raw text and translate descriptions into typed facts, and a robust ILP solver induces rules under a global consistency objective. This approach reduces traditional ILP's reliance on predefined symbolic structures and the noise sensitivity of LLM-only pipelines that directly generate hypotheses as text or code. Extensive experiments in diverse, challenging scenarios validate superior performance, providing a practical, explainable, and verifiable route to hypothesis generation.
