Table of Contents
Fetching ...

Fairness in Streaming Submodular Maximization over a Matroid Constraint

Marwa El Halabi, Federico Fusco, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

TL;DR

The paper tackles fairness in streaming monotone submodular maximization under a matroid constraint (FMMSM). It introduces a theoretically grounded two-pass streaming framework that combines a Fair-Reservoir first pass with a second-pass extension via two parallel matroid-constrained optimizers, achieving an $f(S) \ge \mathrm{OPT}/11.656$ with memory $O(k\cdot C)$ when using a state-of-the-art one-pass streaming algorithm as the subroutine. It also establishes strong impossibility results for single-pass semi-streaming and provides modular-objective results with exact or reduced-complexity algorithms, along with practical heuristics to enhance performance in real-world tasks such as maximum coverage, exemplar-based clustering, and movie recommendations. Overall, the work delineates the memory–quality–fairness trade-offs in FMMSM, delivering scalable algorithms and empirical validation for fair representation in large-scale streaming data scenarios.

Abstract

Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.

Fairness in Streaming Submodular Maximization over a Matroid Constraint

TL;DR

The paper tackles fairness in streaming monotone submodular maximization under a matroid constraint (FMMSM). It introduces a theoretically grounded two-pass streaming framework that combines a Fair-Reservoir first pass with a second-pass extension via two parallel matroid-constrained optimizers, achieving an with memory when using a state-of-the-art one-pass streaming algorithm as the subroutine. It also establishes strong impossibility results for single-pass semi-streaming and provides modular-objective results with exact or reduced-complexity algorithms, along with practical heuristics to enhance performance in real-world tasks such as maximum coverage, exemplar-based clustering, and movie recommendations. Overall, the work delineates the memory–quality–fairness trade-offs in FMMSM, delivering scalable algorithms and empirical validation for fair representation in large-scale streaming data scenarios.

Abstract

Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.
Paper Structure (27 sections, 32 theorems, 31 equations, 1 figure, 7 algorithms)

This paper contains 27 sections, 32 theorems, 31 equations, 1 figure, 7 algorithms.

Key Result

Theorem 1.1

For any constant $\eta \in (0,1/2)$, there exists a one-pass streaming $(1/2-\eta)$-approximation algorithm for FMMSM that uses $2^{O(k^2+k\log C)} \cdot \log \Delta$ memory, where $\Delta=\frac{\max_{e\in V} f(e)}{\min_{\{e\in V \mid f(e) > 0\}} f(e)}$.

Figures (1)

  • Figure 1: Objective values (a,b,c) and number of fairness violations (d,e,f) on the three applications.

Theorems & Definitions (53)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Lemma 2.1: Exchange property of bases
  • Lemma 2.2: Theorem 41.7 in Schrijver03
  • Theorem 3.1
  • Theorem 3.1: Theorem 5.3 in ChenKPSSY21
  • Theorem 3.1
  • Theorem 3.2
  • ...and 43 more