Fair Submodular Cover
Wenjing Chen, Shuo Xing, Samson Zhou, Victoria G. Crawford
TL;DR
This work introduces Fair Submodular Cover (FSC), the problem of minimizing a subset size under a submodular, monotone objective and per-group fairness bounds. By exploiting a dual relationship to fair submodular maximization (FSM), the authors develop two conversion schemes—convert-fair and convert-continuous—that transform FSM bicriteria guarantees into FSC guarantees, preserving fairness via a $\beta$-extension of the fairness matroid. They then provide three FSM bicriteria algorithms (two discrete: greedy-fair-bi, threshold-fairness-bi; one continuous: cont-thresh-greedy-bi) that can be paired with the conversions to yield FSC algorithms with strong approximation ratios approaching the best known for plain submodular cover. Empirical evaluations on maximum-coverage instances demonstrate that the fair algorithms achieve more balanced group representations at the cost of larger solution sizes, validating the practical viability of fair submodular cover in real datasets such as Twitch_5000 and Corel5k.
Abstract
Submodular optimization is a fundamental problem with many applications in machine learning, often involving decision-making over datasets with sensitive attributes such as gender or age. In such settings, it is often desirable to produce a diverse solution set that is fairly distributed with respect to these attributes. Motivated by this, we initiate the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2^U\to\mathbb{R}_{\ge 0}$, a threshold $τ$, the goal is to find a balanced subset of $S$ with minimum cardinality such that $f(S)\geτ$. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(\frac{1}ε, 1-O(ε))$. We then present a continuous algorithm that achieves a $(\ln\frac{1}ε, 1-O(ε))$-bicriteria approximation ratio, which matches the best approximation guarantee of submodular cover without a fairness constraint. Finally, we complement our theoretical results with a number of empirical evaluations that demonstrate the effectiveness of our algorithms on instances of maximum coverage.
