Table of Contents
Fetching ...

Detecting and Mitigating Bias in Algorithms Used to Disseminate Information in Social Networks

Vedran Sekara, Ivan Dotu, Manuel Cebrian, Esteban Moro, Manuel Garcia-Herranz

TL;DR

The results demonstrate it is possible to reduce vulnerability at a relatively low trade-off with respect to spread, highlighting that in the search for maximizing information the authors do not need to compromise on information equality.

Abstract

Social connections are conduits through which individuals communicate, information propagates, and diseases spread. Identifying individuals who are more likely to adopt ideas and spread them is essential in order to develop effective information campaigns, maximize the reach of resources, and fight epidemics. Influence maximization algorithms are used to identify sets of influencers. Based on extensive computer simulations on synthetic and ten diverse real-world social networks we show that seeding information using these methods creates information gaps. Our results show that these algorithms select influencers who do not disseminate information equitably, threatening to create an increasingly unequal society. To overcome this issue we devise a multi-objective algorithm which maximizes influence and information equity. Our results demonstrate it is possible to reduce vulnerability at a relatively low trade-off with respect to spread. This highlights that in our search for maximizing information we do not need to compromise on information equality.

Detecting and Mitigating Bias in Algorithms Used to Disseminate Information in Social Networks

TL;DR

The results demonstrate it is possible to reduce vulnerability at a relatively low trade-off with respect to spread, highlighting that in the search for maximizing information the authors do not need to compromise on information equality.

Abstract

Social connections are conduits through which individuals communicate, information propagates, and diseases spread. Identifying individuals who are more likely to adopt ideas and spread them is essential in order to develop effective information campaigns, maximize the reach of resources, and fight epidemics. Influence maximization algorithms are used to identify sets of influencers. Based on extensive computer simulations on synthetic and ten diverse real-world social networks we show that seeding information using these methods creates information gaps. Our results show that these algorithms select influencers who do not disseminate information equitably, threatening to create an increasingly unequal society. To overcome this issue we devise a multi-objective algorithm which maximizes influence and information equity. Our results demonstrate it is possible to reduce vulnerability at a relatively low trade-off with respect to spread. This highlights that in our search for maximizing information we do not need to compromise on information equality.
Paper Structure (10 sections, 2 equations, 3 figures)

This paper contains 10 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Information is unequally distributed in networks. a, Initial seed sets selected according to HD, CHD, DD, and KC, showing variations in how the four methods select influencers for a social network between households in a south-Indian village banerjee2013diffusion. Here, $5\%$ of nodes (colored) are selected as influencers for illustrative purposes (1% otherwise). b, Effective recency for the social network. Recency is estimated across 1000 runs with infection probability $p = 0.069$ (see SM Sec. S3.1). c, Cumulative distribution of information frequency for synthetic SF-networks with $N = 10^4$, $\gamma = 2.5$, and average infection probability $\langle p \rangle = 0.085$. The curves show the probability that $\nu$ is less than or equal to $x$, where $x$ is any arbitrary value. Results are combined over 100 different network realizations. For each network we select 1 % of nodes as influencers (inferred by one of the heuristics), run the spreading process, track which nodes receive information, and repeat the process $M = 10N$ times to account for stochasticity. Red shaded regions denote parts of the distribution where the effective measure is below one, while grey shaded indicate places where the ratio is above one. d, Cumulative distribution of recency for SF networks. e, Fraction of nodes that are worse off with respect to information frequency in $n$ of the seeding heuristics when compared to the benchmark. Error bars are standard deviation over 100 network realizations. f, Fraction of nodes that are worse off with respect to recency.
  • Figure 2: Information is unequally distributed in real-world social networks. Here we show results for five of the networks, see SM Fig S9 for results for other networks. a, Cumulative distribution of individual node frequency for networks ordered according to size (number of nodes). Initial seeds contain $1\%$ of network nodes, and results are averaged over $10N$ simulations (see SM Table S1). b, Cumulative distribution of recency for empirical networks. c, Fraction of nodes that are worse off with respect to frequency in $n$ of the seeding heuristics when compared to the random seeding procedure ($\nu^{\text{method}}/\nu^{\text{benchmark}}$). Demonstrating that large parts of social networks are in disadvantaged positions. d, Fraction of nodes that are worse off with respect to recency ($\tau^{\text{method}}/\tau^{\text{benchmark}}$) for $n$ methods.
  • Figure 3: Fair influence maximization for social networks. a, Theoretical Pareto front of optimal influencer sets identified by our multi-objective algorithm for the social network between households in a South Indian village, compared to influence maximization heuristics. We disregard KC as it consistently performs worse, both in terms of fairness and reach compared to the other heuristics. Higher values of non-vulnerable nodes indicate higher values of fairness. b, Numerical evaluation of influencer sets using ICMs. Error bars are given as the standard deviation from 10 realization of $10N$ ICM simulations. c, Edge activations for the set of influencers identified by CHD. Edges are colored and sized according to how often they are activated during $10N$ simulations. Nodes colored black are seed nodes. d, Edge activations for one of the fairer seed sets identified by our algorithm. A comprehensive comparison between the seed sets is available in SM Sec. S10. e-h, Theoretical Pareto fronts for four additional real-world social networks (see SM Fig. S15 for results for remaining five networks, and for numerical results from ICMs).