LLM-Generated Explanations Do Not Suffice for Ultra-Strong Machine Learning
Lun Ai, Johannes Langer, Ute Schmid, Stephen Muggleton
TL;DR
This work investigates whether explanations produced by large language models can realize Ultra Strong Machine Learning (USML) by enhancing human learning. It introduces LENS, a neuro-symbolic framework that learns Prolog programs via Inductive Logic Programming and explains them through in-context LLM generation, with LLMs acting as judges against expert references. In a human study on teaching active learning for fault diagnosis, concise expert explanations benefited high-ability learners, whereas LENS-generated explanations offered no advantage over self-learning, despite higher perceived quality. The findings highlight a gap between LLM-generated explanations and human cognitive constraints, suggesting that USML requires human-grounded explanation strategies and evaluation methods, and pointing to future work in advanced program synthesis and cognitive-adaptation approaches.
Abstract
Ultra Strong Machine Learning (USML) refers to symbolic learning systems that not only improve their own performance but can also teach their acquired knowledge to quantifiably improve human performance. We introduce LENS (Logic Programming Explanation via Neural Summarisation), a neuro-symbolic framework that combines symbolic program synthesis with large language models (LLMs). This framework automatically generates natural language explanations of learned logic programs, replacing hand-crafted templates used in prior USML work. Using LLMs-as-judges evaluation and expert validation, we show that LENS produces higher-quality explanations than both direct LLM prompting and hand-crafted templates. We then examine whether LENS explanations suffice for achieving USML in a human trial teaching active learning strategies across three related domains. Our exploratory analysis suggests that concise, expert-written explanations may benefit learners with higher initial performance, while LLM-generated explanations provide no advantage over human self learning despite being rated as higher quality. This case study reveals that achieving USML requires methods grounded in human learning, where current LLM-generated explanations do not capture human cognitive constraints and LLMs-as-judges evaluations do not reflect what effectively supports human learning.
