Table of Contents
Fetching ...

Reasoning about concepts with LLMs: Inconsistencies abound

Rosario Uceda-Sosa, Karthikeyan Natesan Ramamurthy, Maria Chang, Moninder Singh

TL;DR

The paper investigates how large language models (LLMs) inconsistently represent and apply abstract concepts encoded as Is-A hierarchies within knowledge graphs (KGs). It introduces a KG-driven framework that automatically generates test clusters (edge, path, and property-hierarchy) from a Wikidata-derived ontology to probe LLM consistency, using yes/no questions and deductive closure reasoning. Across multiple openly available models, the study finds widespread conceptual inconsistencies, particularly in property inheritance, but shows that simple KG-based prompting and contextual augmentation can significantly reduce errors. These findings highlight the need for domain-specific testing and targeted KG prompting to improve reliability in industrial AI applications.

Abstract

The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.

Reasoning about concepts with LLMs: Inconsistencies abound

TL;DR

The paper investigates how large language models (LLMs) inconsistently represent and apply abstract concepts encoded as Is-A hierarchies within knowledge graphs (KGs). It introduces a KG-driven framework that automatically generates test clusters (edge, path, and property-hierarchy) from a Wikidata-derived ontology to probe LLM consistency, using yes/no questions and deductive closure reasoning. Across multiple openly available models, the study finds widespread conceptual inconsistencies, particularly in property inheritance, but shows that simple KG-based prompting and contextual augmentation can significantly reduce errors. These findings highlight the need for domain-specific testing and targeted KG prompting to improve reliability in industrial AI applications.

Abstract

The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.
Paper Structure (15 sections, 5 figures, 3 tables)

This paper contains 15 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Our proposed approach to test and correct for inconsistencies in an LLM's knowledge of concept hierarchies and in its application to realistic scenarios.
  • Figure 2: A concept hierarchy snapshot: medical specialists and their specialities.
  • Figure 3: Deductive closure between orthopedic pediatric surgeon and medical specialist.
  • Figure 4: Finance domain: home equity loan path
  • Figure 5: Finance domain: eval with simple prompt and with context