Table of Contents
Fetching ...

Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority

Aliakbar Mehdizadeh, Martin Hilbert

TL;DR

The study investigates whether AI-generated Grokipedia relies on the same authority foundations as Wikipedia by comparing 72 matched article pairs using an 8-category epistemic classification of citations. It combines automated source classification with topic labeling and network analyses to reveal a substantial epistemic shift: Grokipedia reduces Academic & Scholarly sourcing and increases User-Generated, NGO/Think Tank, and Government sources, with strong topic-dependent divergences, particularly in sociopolitical domains. A notable finding is a linear scaling law in Grokipedia where citation density grows predictably with article length, unlike the more variable, saturation-prone human-sourced pattern in Wikipedia. The results support a formal notion of Epistemic Substitution, highlighting the need for ongoing algorithmic audits and cross-platform checks as AI-generated encylopedias become more influential in public knowledge.

Abstract

A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge curation. The emergence of generative AI encyclopedias, such as Grokipedia, possibly presents another potential shift in epistemic evolution. This study investigates whether AI- and human-curated encyclopedias rely on the same foundations of authority. We conducted a multi-scale comparative analysis of the citation networks from 72 matched article pairs, which cite a total of almost 60,000 sources. Using an 8-category epistemic classification, we mapped the "epistemic profiles" of the articles on each platform. Our findings reveal several quantitative and qualitative differences in how knowledge is sourced and encyclopedia claims are epistemologically justified. Grokipedia replaces Wikipedia's heavy reliance on peer-reviewed "Academic & Scholarly" work with a notable increase in "User-generated" and "Civic organization" sources. Comparative network analyses further show that Grokipedia employs very different epistemological profiles when sourcing leisure topics (such as Sports and Entertainment) and more societal sensitive civic topics (such as Politics & Conflicts, Geographical Entities, and General Knowledge & Society). Finally, we find a "scaling-law for AI-generated knowledge sourcing" that shows a linear relationship between article length and citation density, which is distinct from collective human reference sourcing. We conclude that this first implementation of an LLM-based encyclopedia does not merely automate knowledge production but restructures it. Given the notable changes and the important role of encyclopedias, we suggest the continuation and deepening of algorithm audits, such as the one presented here, in order to understand the ongoing epistemological shifts.

Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority

TL;DR

The study investigates whether AI-generated Grokipedia relies on the same authority foundations as Wikipedia by comparing 72 matched article pairs using an 8-category epistemic classification of citations. It combines automated source classification with topic labeling and network analyses to reveal a substantial epistemic shift: Grokipedia reduces Academic & Scholarly sourcing and increases User-Generated, NGO/Think Tank, and Government sources, with strong topic-dependent divergences, particularly in sociopolitical domains. A notable finding is a linear scaling law in Grokipedia where citation density grows predictably with article length, unlike the more variable, saturation-prone human-sourced pattern in Wikipedia. The results support a formal notion of Epistemic Substitution, highlighting the need for ongoing algorithmic audits and cross-platform checks as AI-generated encylopedias become more influential in public knowledge.

Abstract

A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge curation. The emergence of generative AI encyclopedias, such as Grokipedia, possibly presents another potential shift in epistemic evolution. This study investigates whether AI- and human-curated encyclopedias rely on the same foundations of authority. We conducted a multi-scale comparative analysis of the citation networks from 72 matched article pairs, which cite a total of almost 60,000 sources. Using an 8-category epistemic classification, we mapped the "epistemic profiles" of the articles on each platform. Our findings reveal several quantitative and qualitative differences in how knowledge is sourced and encyclopedia claims are epistemologically justified. Grokipedia replaces Wikipedia's heavy reliance on peer-reviewed "Academic & Scholarly" work with a notable increase in "User-generated" and "Civic organization" sources. Comparative network analyses further show that Grokipedia employs very different epistemological profiles when sourcing leisure topics (such as Sports and Entertainment) and more societal sensitive civic topics (such as Politics & Conflicts, Geographical Entities, and General Knowledge & Society). Finally, we find a "scaling-law for AI-generated knowledge sourcing" that shows a linear relationship between article length and citation density, which is distinct from collective human reference sourcing. We conclude that this first implementation of an LLM-based encyclopedia does not merely automate knowledge production but restructures it. Given the notable changes and the important role of encyclopedias, we suggest the continuation and deepening of algorithm audits, such as the one presented here, in order to understand the ongoing epistemological shifts.

Paper Structure

This paper contains 25 sections, 21 figures, 14 tables.

Figures (21)

  • Figure 1: Citation Volume vs. Article Length. Scatter plot comparison of Total Citations against Word Count. Stepwise regression selected linear models for both platforms (quadratic terms $p > 0.05$). However, Grokipedia shows a strong predictive fit (Adj. $R^2 = 0.66$) while Wikipedia shows high variance (Adj. $R^2 = 0.36$).
  • Figure 2: Epistemic Profile Analysis (Article-Level). Both panels represent the mean sourcing behavior calculated per article ($N_{Wiki}=72, N_{Grok}=72$), distinct from the global corpus aggregates in Table \ref{['tbl:fingerprint_summary']}. (a) Distribution of citation categories with 95% confidence intervals, highlighting variance across articles. (b) The same data visualized as a stacked mean composition, illustrating the average reliance on different epistemic categories.
  • Figure 4: Topic-Based Epistemic Metrics.(a) Jensen-Shannon Divergence scores sorted by median. (b) Shannon Entropy of citation categories.
  • Figure 5: Comparison of Wikipedia and Grokipedia articles on Cristiano Ronaldo.
  • Figure 6: Per-Article Divergence Between Wikipedia and Grokipedia. The figure shows the distribution of divergence metrics across 72 articles. Left: Cosine Similarity, indicating linear alignment between citation distributions. Right: Jensen-Shannon Divergence, capturing the informational difference between sourcing profiles. Dashed lines indicate the mean value for each metric.
  • ...and 16 more figures