Table of Contents
Fetching ...

Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias

Ortal Hadad, Edoardo Loru, Jacopo Nudo, Anita Bonetti, Matteo Cinelli, Walter Quattrociocchi

Abstract

We present a comparative analysis of Wikipedia and Grokipedia to examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content. We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity. Inclusion is non-uniform: pages with higher visibility and greater editorial conflict in Wikipedia are more likely to appear in Grokipedia. For included pages, we distinguish between verbatim reproduction and generative rewriting. Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification. We compare editing activity across the two platforms and estimate page complexity using a fitness-complexity framework to assess whether generative mediation alters patterns of editorial participation. To assess narrative organization, we construct actor-relation networks from article texts using abstract meaning representation. Across multiple topical domains, including U.S. politics, geopolitics, and conspiracy-related narratives, narrative structure remains largely consistent between the two sources. Analysis of lead sections shows broadly correlated framing, with localized shifts in laudatory and conflict-oriented language for some topics in Grokipedia. Overall, generative systems preserve the main structural organization of encyclopedic content, while affecting how content is selected, rewritten, and framed.

Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias

Abstract

We present a comparative analysis of Wikipedia and Grokipedia to examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content. We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity. Inclusion is non-uniform: pages with higher visibility and greater editorial conflict in Wikipedia are more likely to appear in Grokipedia. For included pages, we distinguish between verbatim reproduction and generative rewriting. Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification. We compare editing activity across the two platforms and estimate page complexity using a fitness-complexity framework to assess whether generative mediation alters patterns of editorial participation. To assess narrative organization, we construct actor-relation networks from article texts using abstract meaning representation. Across multiple topical domains, including U.S. politics, geopolitics, and conspiracy-related narratives, narrative structure remains largely consistent between the two sources. Analysis of lead sections shows broadly correlated framing, with localized shifts in laudatory and conflict-oriented language for some topics in Grokipedia. Overall, generative systems preserve the main structural organization of encyclopedic content, while affecting how content is selected, rewritten, and framed.
Paper Structure (21 sections, 6 equations, 5 figures)

This paper contains 21 sections, 6 equations, 5 figures.

Figures (5)

  • Figure 1: Relationship between Wikipedia page characteristics and content selection and transformation on Grokipedia. Predictors are discretized into four groups (Low, Mid, High, and Very High) and include page popularity (number of page views), content sourcing (number of references), and editorial activity (number of edits and number of reverts). Each row includes two plots: the first plot reports the share of articles exhibiting the outcome across predictor levels, while the second plot displays coefficient estimates from logistic regression models (coef-plots). The first row (a) shows the association between these characteristics and the probability that a Wikipedia page is included in Grokipedia. The second row (b) focuses on the probability that the Grokipedia version has been rewritten with respect to the original Wikipedia page.
  • Figure 2: Editing dynamics on Grokipedia. (a) Fractions of Grokipedia and Wikipedia editors who have contributed on a page. Only pages present in both datasets are shown, but the reported fraction is relative to each platform's entire set of pages. The dashed line corresponds to equal fractions. To reduce overplotting at low fractions, points are shown with a small amount of random jitter. Color indicates the Gini index of Wikipedia page views in November 2025, with higher values indicating greater concentration on specific days rather than uniformity across the month. (b) Page complexity on Grokipedia compared with Wikipedia, calculated from users' editing behavior. Color indicates the difference in ranking between the same page on Grokipedia and Wikipedia, ordered by increasing complexity and referring to the matching subset of pages only. Pages comparatively less (more) complex on Grokipedia are characterized by a negative (positive) difference in ranking.
  • Figure 3: Actor-level differences in evaluative positioning between Wikipedia and Grokipedia across U.S. politics, geopolitics, and conspiracy narratives. Each point corresponds to an actor with high evaluative activity, positioned by the difference in outgoing (x-axis) and incoming (y-axis) sentiment balance between Grokipedia and Wikipedia. Outgoing differences reflect changes in how actors evaluate others, whereas incoming differences reflect changes in how actors are evaluated by the surrounding narrative. Positive values indicate more supportive (or less conflictive) relations in Wikipedia relative to Grokipedia, while negative values indicate the opposite. Color encodes the magnitude of the overall displacement. For clarity, only actors with large displacements, corresponding to an absolute difference of at least 1 in either outgoing or incoming sentiment, are labeled.
  • Figure 4: Content framing scores in Grokipedia and Wikipedia articles across U.S. Politics, Geopolitics, and Conspiracy-related pages. Top: fraction of sentences in the lead section that show praise, admiration, or glorification. Bottom: fraction of sentences in the lead section that focus on disputes, disagreements, or controversies. Color intensity is proportional to the difference between the two fractions, while point shape for pages in U.S. Politics refers to their political leaning. The dashed line represents the quadrant bisector, corresponding to an equal fraction on both platforms. Only a subset of pages is labeled for visual clarity, and among these, some are shortened to improve readability. While scores tend to be weakly or moderately correlated, noteworthy outliers emerge, especially among U.S. Politics pages.
  • Figure 5: Prompt used to assess content framing in an article's lead section.