Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge
Shrestha Ghosh, Luca Giordano, Yujia Hu, Tuan-Phong Nguyen, Simon Razniewski
TL;DR
This work analyzes the factual knowledge encoded in a frontier LLM by leveraging GPTKB v1.5, a large-scale, recursively elicited knowledge base derived from GPT-4.1 containing over 100M factual assertions. It demonstrates that the model stores vast knowledge with biases that differ from traditional knowledge bases, achieving about 75% factual accuracy and showing substantial inconsistency and hallucinations, particularly in dynamic or politically charged domains. The authors validate the approach by surveying size, taxonomy, language distribution, and literals, and they reveal notable biases (gender, geography) and robust multilingual footprints, while highlighting timeliness through recency signals. The study provides a careful, scalable methodology for probing closed-source models and discusses implications for real-time data integration, bias mitigation, and future factuality research in LLMs.
Abstract
LLMs are remarkable artifacts that have revolutionized a range of NLP and AI tasks. A significant contributor is their factual knowledge, which, to date, remains poorly understood, and is usually analyzed from biased samples. In this paper, we take a deep tour into the factual knowledge (or beliefs) of a frontier LLM, based on GPTKB v1.5 (Hu et al., 2025a), a recursively elicited set of 100 million beliefs of one of the strongest currently available frontier LLMs, GPT-4.1. We find that the models' factual knowledge differs quite significantly from established knowledge bases, and that its accuracy is significantly lower than indicated by previous benchmarks. We also find that inconsistency, ambiguity and hallucinations are major issues, shedding light on future research opportunities concerning factual LLM knowledge.
