GOFAI meets Generative AI: Development of Expert Systems by means of Large Language Models
Eduardo C. Garrido-Merchán, Cristina Puente
TL;DR
This paper presents a hybrid framework for building expert systems by extracting structured, symbolic knowledge from large language models (LLMs) and encoding it into Prolog for transparent, rule-based reasoning. By constraining the knowledge domain and using carefully designed prompts, the approach yields verifiable Prolog knowledge bases with high factual accuracy, validated both statistically and via Wikidata, while maintaining interpretability through a symbolic representation. The methodology combines the probabilistic capabilities of LLMs with the deterministic rigor of symbolic systems, supported by PAC-style guarantees and scalable expansion strategies. The work demonstrates strong factual alignment (over 99% in their tests) and demonstrates practical pathways for deploying dependable AI in sensitive domains, with future work on improved entity linking and cross-LLM information gain analysis.
Abstract
The development of large language models (LLMs) has successfully transformed knowledge-based systems such as open domain question nswering, which can automatically produce vast amounts of seemingly coherent information. Yet, those models have several disadvantages like hallucinations or confident generation of incorrect or unverifiable facts. In this paper, we introduce a new approach to the development of expert systems using LLMs in a controlled and transparent way. By limiting the domain and employing a well-structured prompt-based extraction approach, we produce a symbolic representation of knowledge in Prolog, which can be validated and corrected by human experts. This approach also guarantees interpretability, scalability and reliability of the developed expert systems. Via quantitative and qualitative experiments with Claude Sonnet 3.7 and GPT-4.1, we show strong adherence to facts and semantic coherence on our generated knowledge bases. We present a transparent hybrid solution that combines the recall capacity of LLMs with the precision of symbolic systems, thereby laying the foundation for dependable AI applications in sensitive domains.
