Table of Contents
Fetching ...

LLM-Generated Explanations Do Not Suffice for Ultra-Strong Machine Learning

Lun Ai, Johannes Langer, Ute Schmid, Stephen Muggleton

TL;DR

This work investigates whether explanations produced by large language models can realize Ultra Strong Machine Learning (USML) by enhancing human learning. It introduces LENS, a neuro-symbolic framework that learns Prolog programs via Inductive Logic Programming and explains them through in-context LLM generation, with LLMs acting as judges against expert references. In a human study on teaching active learning for fault diagnosis, concise expert explanations benefited high-ability learners, whereas LENS-generated explanations offered no advantage over self-learning, despite higher perceived quality. The findings highlight a gap between LLM-generated explanations and human cognitive constraints, suggesting that USML requires human-grounded explanation strategies and evaluation methods, and pointing to future work in advanced program synthesis and cognitive-adaptation approaches.

Abstract

Ultra Strong Machine Learning (USML) refers to symbolic learning systems that not only improve their own performance but can also teach their acquired knowledge to quantifiably improve human performance. We introduce LENS (Logic Programming Explanation via Neural Summarisation), a neuro-symbolic framework that combines symbolic program synthesis with large language models (LLMs). This framework automatically generates natural language explanations of learned logic programs, replacing hand-crafted templates used in prior USML work. Using LLMs-as-judges evaluation and expert validation, we show that LENS produces higher-quality explanations than both direct LLM prompting and hand-crafted templates. We then examine whether LENS explanations suffice for achieving USML in a human trial teaching active learning strategies across three related domains. Our exploratory analysis suggests that concise, expert-written explanations may benefit learners with higher initial performance, while LLM-generated explanations provide no advantage over human self learning despite being rated as higher quality. This case study reveals that achieving USML requires methods grounded in human learning, where current LLM-generated explanations do not capture human cognitive constraints and LLMs-as-judges evaluations do not reflect what effectively supports human learning.

LLM-Generated Explanations Do Not Suffice for Ultra-Strong Machine Learning

TL;DR

This work investigates whether explanations produced by large language models can realize Ultra Strong Machine Learning (USML) by enhancing human learning. It introduces LENS, a neuro-symbolic framework that learns Prolog programs via Inductive Logic Programming and explains them through in-context LLM generation, with LLMs acting as judges against expert references. In a human study on teaching active learning for fault diagnosis, concise expert explanations benefited high-ability learners, whereas LENS-generated explanations offered no advantage over self-learning, despite higher perceived quality. The findings highlight a gap between LLM-generated explanations and human cognitive constraints, suggesting that USML requires human-grounded explanation strategies and evaluation methods, and pointing to future work in advanced program synthesis and cognitive-adaptation approaches.

Abstract

Ultra Strong Machine Learning (USML) refers to symbolic learning systems that not only improve their own performance but can also teach their acquired knowledge to quantifiably improve human performance. We introduce LENS (Logic Programming Explanation via Neural Summarisation), a neuro-symbolic framework that combines symbolic program synthesis with large language models (LLMs). This framework automatically generates natural language explanations of learned logic programs, replacing hand-crafted templates used in prior USML work. Using LLMs-as-judges evaluation and expert validation, we show that LENS produces higher-quality explanations than both direct LLM prompting and hand-crafted templates. We then examine whether LENS explanations suffice for achieving USML in a human trial teaching active learning strategies across three related domains. Our exploratory analysis suggests that concise, expert-written explanations may benefit learners with higher initial performance, while LLM-generated explanations provide no advantage over human self learning despite being rated as higher quality. This case study reveals that achieving USML requires methods grounded in human learning, where current LLM-generated explanations do not capture human cognitive constraints and LLMs-as-judges evaluations do not reflect what effectively supports human learning.

Paper Structure

This paper contains 53 sections, 5 equations, 13 figures.

Figures (13)

  • Figure 1: USML can quantifiably enhance human task performance compared to human self-learning from examples.
  • Figure 2: Logic Programming Explanation via Neural Summarisation (LENS). The LLM judges can optionally use expert-written references.
  • Figure 3: The left block shows ILP-learned programs, where each episode is learned from a single circuit example. The middle block summarises relevant programs identified for the task. The right block is an action strategy based on relevant programs.
  • Figure 4: Distribution of LLM judged scores for electric circuit domain explanations. RMs and CMs denote reasoning and coding LLMs, respectively. The significance of results has been highlighted by: $p<0.05$ (*), $p<0.01$ (**), $p<0.001$ (***).
  • Figure 5: Distribution of LLM judged scores for explanations. (\ref{['fig:context_scaffolding_island']}, \ref{['fig:template_island']}) game playing Ai2021 and (\ref{['fig:consensus_merge_sort']}, \ref{['fig:template_merge_sort']}) algorithm discovery sequential_teaching. The annotations and markers are consistent with those in Figure \ref{['fig:circuit_score']}.
  • ...and 8 more figures