Table of Contents
Fetching ...

Fast algorithms to improve fair information access in networks

Dennis Robert Windham, Caroline J. Wendt, Alex Crane, Madelyn J Warr, Freda Shi, Sorelle A. Friedler, Blair D. Sullivan, Aaron Clauset

TL;DR

We address the problem of maximin influence maximization: selecting $k$ seeds to maximize the minimum activation probability $\min_i \pi_i$ under the independent cascade, a problem that is NP-hard and traditionally bottlenecked by probability estimation. The authors propose a suite of 10 scalable, probability-estimation-free algorithms across BFS-, PPR-, and topology-based families, complemented by a principled spreadability framework and a new evaluation metric $\beta$, all evaluated on a large corpus of 174 networks. A fast meta-learning approach combines these heuristics to closely match the state-of-the-art while delivering 75–130x speedups, with the fast ensemble offering competitive performance on many networks and scalability to large graphs. These results enable fair information-access interventions at scale and highlight how network structure, particularly average degree, shapes diffusion-based strategies and outcomes.

Abstract

We consider the problem of selecting $k$ seed nodes in a network to maximize the minimum probability of activation under an independent cascade beginning at these seeds. The motivation is to promote fairness by ensuring that even the least advantaged members of the network have good access to information. Our problem can be viewed as a variant of the classic influence maximization objective, but it appears somewhat more difficult to solve: only heuristics are known. Moreover, the scalability of these methods is sharply constrained by the need to repeatedly estimate access probabilities. We design and evaluate a suite of $10$ new scalable algorithms which crucially do not require probability estimation. To facilitate comparison with the state-of-the-art, we make three more contributions which may be of broader interest. We introduce a principled method of selecting a pairwise information transmission parameter used in experimental evaluations, as well as a new performance metric which allows for comparison of algorithms across a range of values for the parameter $k$. Finally, we provide a new benchmark corpus of $174$ networks drawn from $6$ domains. Our algorithms retain most of the performance of the state-of-the-art while reducing running time by orders of magnitude. Specifically, a meta-learner approach is on average only $20\%$ less effective than the state-of-the-art on held-out data, but about $75-130$ times faster. Further, the meta-learner's performance exceeds the state-of the-art on about $20\%$ of networks, and the magnitude of its running time advantage is maintained on much larger networks.

Fast algorithms to improve fair information access in networks

TL;DR

We address the problem of maximin influence maximization: selecting seeds to maximize the minimum activation probability under the independent cascade, a problem that is NP-hard and traditionally bottlenecked by probability estimation. The authors propose a suite of 10 scalable, probability-estimation-free algorithms across BFS-, PPR-, and topology-based families, complemented by a principled spreadability framework and a new evaluation metric , all evaluated on a large corpus of 174 networks. A fast meta-learning approach combines these heuristics to closely match the state-of-the-art while delivering 75–130x speedups, with the fast ensemble offering competitive performance on many networks and scalability to large graphs. These results enable fair information-access interventions at scale and highlight how network structure, particularly average degree, shapes diffusion-based strategies and outcomes.

Abstract

We consider the problem of selecting seed nodes in a network to maximize the minimum probability of activation under an independent cascade beginning at these seeds. The motivation is to promote fairness by ensuring that even the least advantaged members of the network have good access to information. Our problem can be viewed as a variant of the classic influence maximization objective, but it appears somewhat more difficult to solve: only heuristics are known. Moreover, the scalability of these methods is sharply constrained by the need to repeatedly estimate access probabilities. We design and evaluate a suite of new scalable algorithms which crucially do not require probability estimation. To facilitate comparison with the state-of-the-art, we make three more contributions which may be of broader interest. We introduce a principled method of selecting a pairwise information transmission parameter used in experimental evaluations, as well as a new performance metric which allows for comparison of algorithms across a range of values for the parameter . Finally, we provide a new benchmark corpus of networks drawn from domains. Our algorithms retain most of the performance of the state-of-the-art while reducing running time by orders of magnitude. Specifically, a meta-learner approach is on average only less effective than the state-of-the-art on held-out data, but about times faster. Further, the meta-learner's performance exceeds the state-of the-art on about of networks, and the magnitude of its running time advantage is maintained on much larger networks.
Paper Structure (36 sections, 21 figures, 2 tables, 12 algorithms)

This paper contains 36 sections, 21 figures, 2 tables, 12 algorithms.

Figures (21)

  • Figure 1: Algorithm runtime to select 10 new seeds vs. network size for algorithms in Fish_2019, averaged over 10 runs on an introduced large set of networks (see Section \ref{['sec:corpus']}). Algorithms requiring a Monte Carlo simulation (ProbEst) to select seeds are denoted by a $*$.
  • Figure 2: Average degree of a network as a function of network size (number of nodes) for the corpus of 174 networks from 6 distinct domains used in our study.
  • Figure 3: Spreadability on a network is quantified by the average fraction of a network's nodes $\langle |T| \rangle/n$ in a tree $T$ grown through an independent cascade from a random initial seed for a given $\alpha$. We define 'low', 'medium', and 'high' spreadability as the $\alpha$ that activates, on average, $20\%$, $50\%$, and $80\%$ of the network, respectively.
  • Figure 4: Minimum access probability $\pi_{\min}$ vs. seed set size $k$, with a best fit line (Myopic$\hat{\beta}=0.039$), averaged over 20 runs, evaluated on a large economic network ($n=2113$ nodes, $m=57927$ edges), with $\alpha=0.4$ and a budget of $k=10$ seeds, plus one random initial seed.
  • Figure 5: Illustrations of two runs of Myopic for different initial seeds (red), with new selected seeds (yellow), and fixed $\alpha=0.5$. Numbers indicate $\pi$ for each node after the new seed is selected. Initialization significantly affects the performance of Myopic.
  • ...and 16 more figures