Table of Contents
Fetching ...

What did Elon change? A comprehensive analysis of Grokipedia

Harold Triedman, Alexios Mantzarlis

TL;DR

This paper provides the first large scale analysis of Grokipedia, Elon Musk 2025 AI powered alternative to Wikipedia. It constructs a near complete Grokipedia corpus of 883 858 articles and compares content similarity and citations to English Wikipedia using embedding based cosine similarity on 250 token chunks. The results show that CC licensed Grokipedia articles are highly similar to Wikipedia text (~90% per chunk), while non CC content is less similar (~77%), and Grokipedia overall uses more sources including low quality domains, with notable citations to Stormfront and InfoWars. Subset analyses of political figures and controversial topics reveal weaker similarity and lower source quality, suggesting Grokipedia is a derivative and ideologically tilted project with lax sourcing practices. The authors publicly release the Grokipedia scrape and embeddings to facilitate further research and verification.

Abstract

Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

What did Elon change? A comprehensive analysis of Grokipedia

TL;DR

This paper provides the first large scale analysis of Grokipedia, Elon Musk 2025 AI powered alternative to Wikipedia. It constructs a near complete Grokipedia corpus of 883 858 articles and compares content similarity and citations to English Wikipedia using embedding based cosine similarity on 250 token chunks. The results show that CC licensed Grokipedia articles are highly similar to Wikipedia text (~90% per chunk), while non CC content is less similar (~77%), and Grokipedia overall uses more sources including low quality domains, with notable citations to Stormfront and InfoWars. Subset analyses of political figures and controversial topics reveal weaker similarity and lower source quality, suggesting Grokipedia is a derivative and ideologically tilted project with lax sourcing practices. The authors publicly release the Grokipedia scrape and embeddings to facilitate further research and verification.

Abstract

Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

Paper Structure

This paper contains 27 sections, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The distribution of article outline length ratios, split by Grokipedia article license status.
  • Figure 2: Per-article average similarity embedding distributions, split by whether the Grokipedia article is CC-licensed or not.
  • Figure 3: Average chunk similarity by position in an article. Position 1 is the beginning of an article, position 2 is the second chunk, etc.
  • Figure 4: The top 100 most-cited domains on Wikipedia and Grokipedia. Domains are bold if they are on both lists, and lines connecting the domains show their position change. Color indicates domain type.
  • Figure 5: (Left) The relative proportion of Perennial Source list categories in Wikipedia and Grokipedia citations. (Right) The percentage of articles that contain at least one of each category of citation in the two corpora.
  • ...and 11 more figures