Inductive Learning of Logical Theories with LLMs: An Expressivity-Graded Analysis
João Pedro Gandarela, Danilo S. Carvalho, André Freitas
TL;DR
This work presents a graded methodology for evaluating how well large language models can induce logical theories by coupling LLMs with a formal inference engine (Prolog) and a synthetic data generator that varies rule expressivity and noise. Through iterative theory generation and evaluation, the approach benchmarks against a state-of-the-art ILP system across expressivity categories CHAIN, RDG, DRDG, and MIXED. Findings show that larger LLMs can match or approach ILP performance at higher noise but struggle with long predicate chains and exhibit non-monotonic improvements with more iterations; model size alone is not a reliable predictor of robustness. The proposed framework yields a reusable, graded pipeline for assessing inductive capabilities of LLMs with formal grounding, aiding interpretability and systematic comparison across models and tasks.
Abstract
This work presents a novel systematic methodology to analyse the capabilities and limitations of Large Language Models (LLMs) with feedback from a formal inference engine, on logic theory induction. The analysis is complexity-graded w.r.t. rule dependency structure, allowing quantification of specific inference challenges on LLM performance. Integrating LLMs with formal methods is a promising frontier in the Natural Language Processing field, as an important avenue for improving model inference control and explainability. In particular, inductive learning over complex sets of facts and rules, poses unique challenges for current autoregressive models, as they lack explicit symbolic grounding. While they can be complemented by formal systems, the properties delivered by LLMs regarding inductive learning, are not well understood and quantified. Empirical results indicate that the largest LLMs can achieve competitive results against a SOTA Inductive Logic Programming (ILP) system baseline, but also that tracking long predicate relationship chains is a more difficult obstacle than theory complexity for LLMs.
