Table of Contents
Fetching ...

DSP: A Statistically-Principled Structural Polarization Measure

Giulia Preti, Matteo Riondato, Aristides Gionis, Gianmarco De Francisci Morales

TL;DR

DSP delivers a principled diffusion-based polarization metric that removes the influencer bias inherent in prior measures. By modeling information spread from every vertex through a probing process and integrating a null-model core, it achieves unbiased, interpretable polarization scores and zero on standard random graphs. The framework is validated on synthetic topologies and real-world data, demonstrating correct polarization behavior, robustness to partition imbalance, and practical utility for tracking polarization trends in contexts like the US Congress. The integration of a null model and efficient approximation further enhances its reliability for large-scale network analysis and policy-oriented diagnostics.

Abstract

Social and information networks may become polarized, leading to echo chambers and political gridlock. Accurately measuring this phenomenon is a critical challenge. Existing measures often conflate genuine structural division with random topological features, yielding misleadingly high polarization scores on random networks, and failing to distinguish real-world networks from randomized null models. We introduce DSP, a Diffusion-based Structural Polarization measure designed from first principles to correct for such biases. DSP removes the arbitrary concept of 'influencers' used by the popular Random Walk Controversy (RWC) score, instead treating every node as a potential origin for a random walk. To validate our approach, we introduce a set of desirable properties for polarization measures, expressed through reference topologies with known structural properties. We show that DSP satisfies these desiderata, being near-zero for non-polarized structures such as cliques and random networks, while correctly capturing the expected polarization of reference topologies such as monochromatic-splittable networks. Our method applied to U.S. Congress datasets uncovers trends of increasing polarization in recent years. By integrating a null model into its core definition, DSP provides a reliable and interpretable diagnostic tool, highlighting the necessity of statistically-grounded metrics to analyze societal fragmentation.

DSP: A Statistically-Principled Structural Polarization Measure

TL;DR

DSP delivers a principled diffusion-based polarization metric that removes the influencer bias inherent in prior measures. By modeling information spread from every vertex through a probing process and integrating a null-model core, it achieves unbiased, interpretable polarization scores and zero on standard random graphs. The framework is validated on synthetic topologies and real-world data, demonstrating correct polarization behavior, robustness to partition imbalance, and practical utility for tracking polarization trends in contexts like the US Congress. The integration of a null model and efficient approximation further enhances its reliability for large-scale network analysis and policy-oriented diagnostics.

Abstract

Social and information networks may become polarized, leading to echo chambers and political gridlock. Accurately measuring this phenomenon is a critical challenge. Existing measures often conflate genuine structural division with random topological features, yielding misleadingly high polarization scores on random networks, and failing to distinguish real-world networks from randomized null models. We introduce DSP, a Diffusion-based Structural Polarization measure designed from first principles to correct for such biases. DSP removes the arbitrary concept of 'influencers' used by the popular Random Walk Controversy (RWC) score, instead treating every node as a potential origin for a random walk. To validate our approach, we introduce a set of desirable properties for polarization measures, expressed through reference topologies with known structural properties. We show that DSP satisfies these desiderata, being near-zero for non-polarized structures such as cliques and random networks, while correctly capturing the expected polarization of reference topologies such as monochromatic-splittable networks. Our method applied to U.S. Congress datasets uncovers trends of increasing polarization in recent years. By integrating a null model into its core definition, DSP provides a reliable and interpretable diagnostic tool, highlighting the necessity of statistically-grounded metrics to analyze societal fragmentation.

Paper Structure

This paper contains 17 sections, 34 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Polarization scores in 1000.0 random networks from $G(n,p,\mathbf{\ell})$, each with 10000.0 vertices, varying average degree and partition sizes: 50% red--50% blue, 70% red--30% blue, and 90% red--10% blue. $\text{RWC}$ (left) shows an unwarranted positive bias, due to overlap between the restart set and the influencers. Removing this overlap eliminates the bias, as shown for the "no-influencers" variant of $\text{RWC}$ (right).
  • Figure 2: Polarization in a bi-colored clique with $5000$ vertices and partitions of different sizes: 50% red--50% blue, and 90% red--10% blue. The dashed line denotes the desired value of $0$.
  • Figure 3: Polarization in a bi-colored alternating cycle with $5000$ vertices, 50% red--50% blue. We show the rescaled values and the rescaled denoised values computed using the $1k$-series salloum2022separating. The gradient area indicates the desired scores.
  • Figure 4: Polarization in a bi-colored half-split cycle with $5000$ vertices and two partition sizes: 50% blue--50% red (left), and 90% red--10% blue (right). We show the rescaled and rescaled-denoised values computed using the $1k$-series salloum2022separating. The gradient area indicates the desired scores.
  • Figure 5: Polarization in a bi-colored half-split barbell network with $2000$ vertices and two partition sizes: 50% blue--50% red (left), and 90% red--10% blue (right). We show the rescaled and rescaled-denoised values computed using the $1k$-series salloum2022separating. The dashed line denotes the desired value of $1$.
  • ...and 12 more figures