Table of Contents
Fetching ...

Measuring Centralization of Online Platforms Through Size and Interconnection of Communities

Milo Z. Trujillo, Laurent Hébert-Dufresne, James Bagrow

TL;DR

The paper addresses how to quantify centralization in socio‑technical platforms by defining disruption curves for bipartite user–community networks and summarizing them with DAUC. It introduces a principled mesoscale measure of community influence that accounts for edge disruption when removing communities, contrasting it with size distributions and standard bottleneck metrics. An empirical analysis across Mastodon, Penumbra, BitChute, Voat, and Usenet, plus synthetic network models, reveals that size skew does not reliably indicate centralization and that assortativity and cross‑community connectivity play critical roles. The framework provides a practical tool for comparing platform structures and offers directions for richer, more realistic modeling of information flow across multi‑community ecosystems.

Abstract

Decentralized architecture offers a robust and flexible structure for online platforms, since centralized moderation and computation can be easy to disrupt with targeted attacks. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper we introduce a method of characterizing community influence in terms of how many edges between communities would be disrupted by a community's removal. Our approach provides a careful definition of "centralization" appropriate in bipartite user-community socio-technical networks, and demonstrates the inadequacy of more trivial methods for interrogating centralization such as examining the distribution of community sizes. We use this method to compare the structure of multiple socio-technical platforms -- Mastodon, git code hosting servers, BitChute, Usenet, and Voat -- and find a range of structures, from interconnected but decentralized git servers to an effectively centralized use of Mastodon servers, as well as multiscale hybrid network structures of disconnected Voat subverses. As the ecosystem of socio-technical platforms diversifies, it becomes critical to not solely focus on the underlying technologies but also consider the structure of how users interact through the technical infrastructure.

Measuring Centralization of Online Platforms Through Size and Interconnection of Communities

TL;DR

The paper addresses how to quantify centralization in socio‑technical platforms by defining disruption curves for bipartite user–community networks and summarizing them with DAUC. It introduces a principled mesoscale measure of community influence that accounts for edge disruption when removing communities, contrasting it with size distributions and standard bottleneck metrics. An empirical analysis across Mastodon, Penumbra, BitChute, Voat, and Usenet, plus synthetic network models, reveals that size skew does not reliably indicate centralization and that assortativity and cross‑community connectivity play critical roles. The framework provides a practical tool for comparing platform structures and offers directions for richer, more realistic modeling of information flow across multi‑community ecosystems.

Abstract

Decentralized architecture offers a robust and flexible structure for online platforms, since centralized moderation and computation can be easy to disrupt with targeted attacks. However, a platform offering a decentralized architecture does not guarantee that users will use it in a decentralized way, and measuring the centralization of socio-technical networks is not an easy task. In this paper we introduce a method of characterizing community influence in terms of how many edges between communities would be disrupted by a community's removal. Our approach provides a careful definition of "centralization" appropriate in bipartite user-community socio-technical networks, and demonstrates the inadequacy of more trivial methods for interrogating centralization such as examining the distribution of community sizes. We use this method to compare the structure of multiple socio-technical platforms -- Mastodon, git code hosting servers, BitChute, Usenet, and Voat -- and find a range of structures, from interconnected but decentralized git servers to an effectively centralized use of Mastodon servers, as well as multiscale hybrid network structures of disconnected Voat subverses. As the ecosystem of socio-technical platforms diversifies, it becomes critical to not solely focus on the underlying technologies but also consider the structure of how users interact through the technical infrastructure.
Paper Structure (18 sections, 4 equations, 13 figures, 2 tables)

This paper contains 18 sections, 4 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Summary measures of centralization. (a) Our measure of community disruption (bottom) does not correlate with the population distribution of communities (top). (b) The area under the disruption curve (DAUC) provides a summary statistic of the disruption curve that reinforces how network structure combined with community size provide greater insight into centralization (measurement details in \ref{['sec:auc_explanation']}). Here, panel (a) consists of cumulative distribution plots of population and disruption, where the top subplot is a CDF of the platform population as smaller communities are included, and the bottom subplot shows how networks are damaged as more of the largest communities are removed. Each line represents a different network, using the color key from panel b.
  • Figure 2: In simulated networks with a variety of degree distributions, the disruption curves for each network much more closely match the population distribution (\ref{['fig:toy_networks_size_comparison']}), suggesting that non-degree network attributes such as assortativity play a crucial role in determining centralization. As in \ref{['fig:real_networks_main_figure']}, the left figure represents cumulative population and disruption as more communities are considered. Each line represents a network sharing the color-key in the right figure. Simulated networks were generated 100 times, and the mean and a 95% confidence interval are shown in both figures.
  • Figure 3: The two largest Voat communities ('QRV' and '8chan') are dramatically larger than their peers, but have almost no overlap in population, making community size a poor proxy for platform-wide influence or centralization. In this network visualization, nodes represent Voat "subverses," and edges represent at least thirty shared users active in two communities. Node size correlates with user count, and color correlates with strength; i.e. the level of overlap with neighboring communities. The purple communities at the center are default subverses all new users are subscribed to ("news," "whatever," etc), surrounding pink and orange communities are popular with lots of user overlap. The largest two communities, "QRV" and "8chan," have almost no user overlap with other communities and are rendered to the right.
  • Figure 4: Increasing user-community degree assortativity through edge-rewiring increases the influence of the largest communities in highly insular (Voat) or sparse settings (Penumbra), but decreases disruption in all networks as increased rewirings eliminate cross-community edges and yield insular and sparse networks. Y-axis represents disruption AUC (see \ref{['fig:real_networks_auc']}), so that the slope shows change in disruption AUC as networks are rewired to increase user-community degree assortativity.
  • Figure S1: Example of applying our disruption metric to unipartite graphs by detecting communities on a unipartite small-world network (top-left), converting labeled communities into a bipartite representation (top-right), and running our influence metric on the bipartite graph (bottom)
  • ...and 8 more figures