Table of Contents
Fetching ...

A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models

Utkarsh Sahu, Zhisheng Qi, Yongjia Lei, Ryan A. Rossi, Franck Dernoncourt, Nesreen K. Ahmed, Mahantesh M Halappanavar, Yao Ma, Yu Wang

TL;DR

This work tackles the problem of understanding how knowledge is structurally organized in large language models by proposing a graph-centric framework. It defines triplet-level and entity-level knowledgeability, analyzes their relation to graph properties such as node degree and homophily, and demonstrates that entities exhibit strong neighborhood similarity, i.e., knowledgeability homophily. A graph neural network regression model leverages local neighborhood information to predict entity knowledgeability, enabling active selection of informative triplets for fine-tuning LLMs, with Graph-FT consistently outperforming random triplet selection. The findings suggest practical pathways for more efficient knowledge checking, targeted data curation, and potential extensions to knowledge graph retrieval and deficiency detection, grounded in the observed topological patterns of LLM knowledge.

Abstract

Large language models have been extensively studied as neural knowledge bases for their knowledge access, editability, reasoning, and explainability. However, few works focus on the structural patterns of their knowledge. Motivated by this gap, we investigate these structural patterns from a graph perspective. We quantify the knowledge of LLMs at both the triplet and entity levels, and analyze how it relates to graph structural properties such as node degree. Furthermore, we uncover the knowledge homophily, where topologically close entities exhibit similar levels of knowledgeability, which further motivates us to develop graph machine learning models to estimate entity knowledge based on its local neighbors. This model further enables valuable knowledge checking by selecting triplets less known to LLMs. Empirical results show that using selected triplets for fine-tuning leads to superior performance.

A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models

TL;DR

This work tackles the problem of understanding how knowledge is structurally organized in large language models by proposing a graph-centric framework. It defines triplet-level and entity-level knowledgeability, analyzes their relation to graph properties such as node degree and homophily, and demonstrates that entities exhibit strong neighborhood similarity, i.e., knowledgeability homophily. A graph neural network regression model leverages local neighborhood information to predict entity knowledgeability, enabling active selection of informative triplets for fine-tuning LLMs, with Graph-FT consistently outperforming random triplet selection. The findings suggest practical pathways for more efficient knowledge checking, targeted data curation, and potential extensions to knowledge graph retrieval and deficiency detection, grounded in the observed topological patterns of LLM knowledge.

Abstract

Large language models have been extensively studied as neural knowledge bases for their knowledge access, editability, reasoning, and explainability. However, few works focus on the structural patterns of their knowledge. Motivated by this gap, we investigate these structural patterns from a graph perspective. We quantify the knowledge of LLMs at both the triplet and entity levels, and analyze how it relates to graph structural properties such as node degree. Furthermore, we uncover the knowledge homophily, where topologically close entities exhibit similar levels of knowledgeability, which further motivates us to develop graph machine learning models to estimate entity knowledge based on its local neighbors. This model further enables valuable knowledge checking by selecting triplets less known to LLMs. Empirical results show that using selected triplets for fine-tuning leads to superior performance.

Paper Structure

This paper contains 24 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: (a) Prompting LLMs to check their knowledge about each triplet and aggregate them to obtain entity knowledgeabilty; (b) These scores are assigned to graph nodes, enabling analysis of structural patterns such as knowledge imbalance (depicted in darker/lighter color), and knowledge homophly where topologically close entities possess similar levels of knowledgeability.
  • Figure 2: (a)/(b): Distribution of node knowledgeability/homophily for each dataset; (c): Node knowledgeability increases as node degree increases. The results here are based on GPT3.5, and results for other LLMs hold similar observations in Appendix \ref{['app-result']}. (d): Average homophily for all datasets given by different LLMs exceeds 0.6.
  • Figure 3: Relation between regression performance and homophily at (a) graph and (b) node level.
  • Figure 4: LLaMa (a): Distribution of node knowledgeability for each dataset; (b): Distribution of node homophily for each dataset; (c): Node knowledgeability increases as node degree increases.
  • Figure 5: Deepseek (a): Distribution of node knowledgeability for each dataset; (b): Distribution of node homophily for each dataset; (c): Node knowledgeability increases as node degree increases.
  • ...and 7 more figures