Fast algorithms to improve fair information access in networks
Dennis Robert Windham, Caroline J. Wendt, Alex Crane, Madelyn J Warr, Freda Shi, Sorelle A. Friedler, Blair D. Sullivan, Aaron Clauset
TL;DR
We address the problem of maximin influence maximization: selecting $k$ seeds to maximize the minimum activation probability $\min_i \pi_i$ under the independent cascade, a problem that is NP-hard and traditionally bottlenecked by probability estimation. The authors propose a suite of 10 scalable, probability-estimation-free algorithms across BFS-, PPR-, and topology-based families, complemented by a principled spreadability framework and a new evaluation metric $\beta$, all evaluated on a large corpus of 174 networks. A fast meta-learning approach combines these heuristics to closely match the state-of-the-art while delivering 75–130x speedups, with the fast ensemble offering competitive performance on many networks and scalability to large graphs. These results enable fair information-access interventions at scale and highlight how network structure, particularly average degree, shapes diffusion-based strategies and outcomes.
Abstract
We consider the problem of selecting $k$ seed nodes in a network to maximize the minimum probability of activation under an independent cascade beginning at these seeds. The motivation is to promote fairness by ensuring that even the least advantaged members of the network have good access to information. Our problem can be viewed as a variant of the classic influence maximization objective, but it appears somewhat more difficult to solve: only heuristics are known. Moreover, the scalability of these methods is sharply constrained by the need to repeatedly estimate access probabilities. We design and evaluate a suite of $10$ new scalable algorithms which crucially do not require probability estimation. To facilitate comparison with the state-of-the-art, we make three more contributions which may be of broader interest. We introduce a principled method of selecting a pairwise information transmission parameter used in experimental evaluations, as well as a new performance metric which allows for comparison of algorithms across a range of values for the parameter $k$. Finally, we provide a new benchmark corpus of $174$ networks drawn from $6$ domains. Our algorithms retain most of the performance of the state-of-the-art while reducing running time by orders of magnitude. Specifically, a meta-learner approach is on average only $20\%$ less effective than the state-of-the-art on held-out data, but about $75-130$ times faster. Further, the meta-learner's performance exceeds the state-of the-art on about $20\%$ of networks, and the magnitude of its running time advantage is maintained on much larger networks.
