Table of Contents
Fetching ...

Misinformation Resilient Search Rankings with Webgraph-based Interventions

Peter Carragher, Evan M. Williams, Kathleen M. Carley

TL;DR

This work studies misinformation-aware search ranking by designing webgraph-based interventions that penalize unreliable domains while sparing reliable ones. It presents two intervention classes—link scheme removal and link multiplicity weighting—and validates them through small-scale regression-based simulations and large-scale PageRank analyses, augmented by Anti-TrustRank and debiasing to improve fairness. The results show meaningful reductions in traffic and ranking for unreliable domains with modest collateral impact on reliable domains, and demonstrate that interventions can be tuned and extended (e.g., multi-category seed strategies) to limit unintended effects at web scale. Overall, the paper provides a principled, scalable blueprint for enhancing the trustworthiness of search results and offers practical mitigations for potential side effects, guiding future research and potential collaborations with search engines and regulators.

Abstract

The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact-check agnostic), targeted (increase the cost of adversarial behavior), and scalability (works at webscale). We refine our methods on small-scale webdata as a testbed and then generalize the interventions to a large-scale webgraph containing 93.9M domains and 1.6B edges. We demonstrate that our methods penalize unreliable domains far more than reliable domains in both settings and we explore multiple avenues to mitigate unintended effects on both the small-scale and large-scale webgraph experiments. These results indicate the potential of our approach to reduce the spread of misinformation and foster a more reliable online information ecosystem. This research contributes to the development of targeted strategies to enhance the trustworthiness and quality of search engine results, ultimately benefiting users and the broader digital community.

Misinformation Resilient Search Rankings with Webgraph-based Interventions

TL;DR

This work studies misinformation-aware search ranking by designing webgraph-based interventions that penalize unreliable domains while sparing reliable ones. It presents two intervention classes—link scheme removal and link multiplicity weighting—and validates them through small-scale regression-based simulations and large-scale PageRank analyses, augmented by Anti-TrustRank and debiasing to improve fairness. The results show meaningful reductions in traffic and ranking for unreliable domains with modest collateral impact on reliable domains, and demonstrate that interventions can be tuned and extended (e.g., multi-category seed strategies) to limit unintended effects at web scale. Overall, the paper provides a principled, scalable blueprint for enhancing the trustworthiness of search results and offers practical mitigations for potential side effects, guiding future research and potential collaborations with search engines and regulators.

Abstract

The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact-check agnostic), targeted (increase the cost of adversarial behavior), and scalability (works at webscale). We refine our methods on small-scale webdata as a testbed and then generalize the interventions to a large-scale webgraph containing 93.9M domains and 1.6B edges. We demonstrate that our methods penalize unreliable domains far more than reliable domains in both settings and we explore multiple avenues to mitigate unintended effects on both the small-scale and large-scale webgraph experiments. These results indicate the potential of our approach to reduce the spread of misinformation and foster a more reliable online information ecosystem. This research contributes to the development of targeted strategies to enhance the trustworthiness and quality of search engine results, ultimately benefiting users and the broader digital community.
Paper Structure (54 sections, 6 equations, 7 figures, 7 tables)

This paper contains 54 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Analysis of news domain label distributions (left) and backlinking domains (right). The distribution of link counts from source domains to targets of varying reliability reveals a significant distributional discrepancy between how link schemes and genuine backlink sites link to targets.
  • Figure 2: Comparisons of Ahrefs Traffic Estimates and Common Crawl PageRank Estimates to SimilarWeb Estimates (logged). Blue domains are reliable by MBFC estimates and red domains are unreliable or mixed reliability.
  • Figure 3: The link multiplicity distribution (left) reveals a dichotomy between links that appear a handful of times and links that occur extremely frequently on a source domain ($>40$ times). Link Multiplicity scores (right) are derived from this link multiplicity distribution and capture this dichotomy with a distinct dumbbell-like shape.
  • Figure 4: Experiments in mitigating unintended effects of large-scale interventions. As we debias the dataset, performance on reliability classification tasks slightly degrades but intervention quality mostly stays constant. As repair level R increases, disparate impact (DI) slightly improves alongside a slight decline in F1 score on the reliability classification task (left). However, RIS is constant for the link scheme removal intervention across all repair levels during debiasing (center).
  • Figure 5: PR changes per intervention, grouped by reliability. Lower-reliability news sites are more affected than higher-reliability sites, which remain consistent.
  • ...and 2 more figures