On the Centralization and Regionalization of the Web
Gautam Akiwate, Kimberly Ruth, Rumaisa Habib, Zakir Durumeric
TL;DR
The paper defines a formal centralization metric using $EMD$ (Wasserstein distance) to measure how far observed provider distributions are from a fully decentralized reference, and applies it to hosting, DNS, TLD, and CA layers across 150 countries. It also introduces usage and endemicity to describe provider reach and geographic concentration, respectively, using CrUX data and active measurements to map cross-layer dependencies. The findings show pronounced country-level variation, with dominant global players like Cloudflare and Let's Encrypt shaping centralization, while regional providers influence centralization in many contexts; insularity and regionalization emerge as key factors driving these patterns. By providing a rigorous, quantitative framework, the work enables nuanced cross-country comparisons and highlights cross-layer interactions and sociopolitical drivers that affect Internet structure and resilience.
Abstract
Over the past decade, Internet centralization and its implications for both people and the resilience of the Internet has become a topic of active debate. While the networking community informally agrees on the definition of centralization, we lack a formal metric for quantifying centralization, which limits research beyond descriptive analysis. In this work, we introduce a statistical measure for Internet centralization, which we use to better understand how the web is centralized across four layers of web infrastructure (hosting providers, DNS infrastructure, TLDs, and certificate authorities) in 150~countries. Our work uncovers significant geographical variation, as well as a complex interplay between centralization and sociopolitically driven regionalization. We hope that our work can serve as the foundation for more nuanced analysis to inform this important debate.
