Table of Contents
Fetching ...

It's 2025 -- Narrative Learning is the new baseline to beat for explainable machine learning

Gregory D. Baker

TL;DR

This work introduces Narrative Learning, a unified explainable AI paradigm in which models are defined and refined entirely in natural language. An Overseer generates classification narratives that Underlings execute, with iterative prompts and ensembling of multiple narratives to improve accuracy; datasets are transformed to prevent reliance on prior domain knowledge. Across six datasets (three synthetic, three natural), Narrative Learning ensembles outperform traditional explainable baselines on most tasks, with statistical trends improving as newer language models are deployed. The study also analyzes the linguistic complexity of explanations, finding no significant escalation, and outlines practical future directions, including broader datasets, prompt engineering, and accessibility of tooling. The approach offers a new lens for explainable AI where explanations themselves become the learnable, testable artefact, enabling iterative scientific progress in AI explainability.

Abstract

In this paper, we introduce Narrative Learning, a methodology where models are defined entirely in natural language and iteratively refine their classification criteria using explanatory prompts rather than traditional numerical optimisation. We report on experiments to evaluate the accuracy and potential of this approach using 3 synthetic and 3 natural datasets and compare them against 7 baseline explainable machine learning models. We demonstrate that on 5 out of 6 of these datasets, Narrative Learning became more accurate than the baseline explainable models in 2025 or earlier because of improvements in language models. We also report on trends in the lexicostatistics of these models' outputs as a proxy for the comprehensibility of the explanations.

It's 2025 -- Narrative Learning is the new baseline to beat for explainable machine learning

TL;DR

This work introduces Narrative Learning, a unified explainable AI paradigm in which models are defined and refined entirely in natural language. An Overseer generates classification narratives that Underlings execute, with iterative prompts and ensembling of multiple narratives to improve accuracy; datasets are transformed to prevent reliance on prior domain knowledge. Across six datasets (three synthetic, three natural), Narrative Learning ensembles outperform traditional explainable baselines on most tasks, with statistical trends improving as newer language models are deployed. The study also analyzes the linguistic complexity of explanations, finding no significant escalation, and outlines practical future directions, including broader datasets, prompt engineering, and accessibility of tooling. The approach offers a new lens for explainable AI where explanations themselves become the learnable, testable artefact, enabling iterative scientific progress in AI explainability.

Abstract

In this paper, we introduce Narrative Learning, a methodology where models are defined entirely in natural language and iteratively refine their classification criteria using explanatory prompts rather than traditional numerical optimisation. We report on experiments to evaluate the accuracy and potential of this approach using 3 synthetic and 3 natural datasets and compare them against 7 baseline explainable machine learning models. We demonstrate that on 5 out of 6 of these datasets, Narrative Learning became more accurate than the baseline explainable models in 2025 or earlier because of improvements in language models. We also report on trends in the lexicostatistics of these models' outputs as a proxy for the comprehensibility of the explanations.

Paper Structure

This paper contains 19 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Narrative Learning flow diagram
  • Figure 2: Reverse translated Narrative Learning output for the Titanic dataset (OpenAI o1 with 10 examples)
  • Figure 3: Trend over time for ensembles of narrative learners.
  • Figure 4: Scatter plot of the accuracy scores from overseers run with 10 examples per round versus being run with 3 examples per round. If 10 examples improved output, the data points would be highly asymmetric around the central line.
  • Figure 5: Narrative Learning is likely to maintain comprehensibility. There is no statistically significant change over time in Herdan's coefficients for the language used in the reasoning over the overseer and the resulting prompts, even while the accuracy of the ensembles improved substantially.