Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

Amit Chakrabarti; Andrew McGregor; Anthony Wirth

Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

Amit Chakrabarti, Andrew McGregor, Anthony Wirth

TL;DR

This work advances maximum coverage in data streams by delivering near-optimal $(1-1/e-\varepsilon)$-approximation algorithms across three streaming models. It introduces a dynamic-stream algorithm with $O(1+\varepsilon^{-1}/\log\log m)\log m$ passes and space $\varepsilon^{-2} k\,\mathrm{polylog}(n,m)$, surpassing prior $n$- or $\varepsilon$-dependent space bounds, via a cascading urn framework that manages thresholds across multiple scales. In the random order model, it achieves a single-pass, $O_\varepsilon(k\,\mathrm{polylog}(n,m))$-space algorithm with $1-1/e-\varepsilon$ in expectation, improving the previous $k^2$-space bound and leveraging universe subsampling. For the insert-only model, the authors combine subsampling with a refined streaming implementation to attain polylogarithmic amortized update time, improving over $k$-dependent update costs. Overall, the paper advances practical, space-efficient, and fast streaming algorithms for MaxCov with strong theoretical guarantees, supported by novel analyses such as single-urn and cascading-urn processes and optimized thresholding strategies.

Abstract

The maximum coverage problem is to select $k$ sets from a collection of sets such that the cardinality of the union of the selected sets is maximized. We consider $(1-1/e-ε)$-approximation algorithms for this NP-hard problem in three standard data stream models. 1. {\em Dynamic Model.} The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses $ε^{-2} k \cdot \text{polylog}(n,m)$ space. The best previous result (Assadi and Khanna, SODA 2018) used $(n +ε^{-4} k) \text{polylog}(n,m)$ space. While both algorithms use $O(ε^{-1} \log n)$ passes, our analysis shows that when $ε$ is a constant, it is possible to reduce the number of passes by a $1/\log \log n$ factor without incurring additional space. 2. {\em Random Order Model.} In this model, there are no deletions and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and $k \text{polylog}(n,m)$ space suffices for arbitrary small constant $ε$. The best previous result, by Warneke et al.~(ESA 2023), used $k^2 \text{polylog}(n,m)$ space. 3. {\em Insert-Only Model.} Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: $\text{polylog}(m,n)$ update time suffices whereas the best previous result by Jaud et al.~(SEA 2023) required update time that was linear in $k$.

Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

TL;DR

This work advances maximum coverage in data streams by delivering near-optimal

-approximation algorithms across three streaming models. It introduces a dynamic-stream algorithm with

passes and space

, surpassing prior

- or

-dependent space bounds, via a cascading urn framework that manages thresholds across multiple scales. In the random order model, it achieves a single-pass,

-space algorithm with

in expectation, improving the previous

-space bound and leveraging universe subsampling. For the insert-only model, the authors combine subsampling with a refined streaming implementation to attain polylogarithmic amortized update time, improving over

-dependent update costs. Overall, the paper advances practical, space-efficient, and fast streaming algorithms for MaxCov with strong theoretical guarantees, supported by novel analyses such as single-urn and cascading-urn processes and optimized thresholding strategies.

Abstract

The maximum coverage problem is to select

sets from a collection of sets such that the cardinality of the union of the selected sets is maximized. We consider

-approximation algorithms for this NP-hard problem in three standard data stream models. 1. {\em Dynamic Model.} The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses

space. The best previous result (Assadi and Khanna, SODA 2018) used

space. While both algorithms use

passes, our analysis shows that when

is a constant, it is possible to reduce the number of passes by a

factor without incurring additional space. 2. {\em Random Order Model.} In this model, there are no deletions and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and

space suffices for arbitrary small constant

. The best previous result, by Warneke et al.~(ESA 2023), used

space. 3. {\em Insert-Only Model.} Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time:

update time suffices whereas the best previous result by Jaud et al.~(SEA 2023) required update time that was linear in

Paper Structure (24 sections, 9 theorems, 32 equations, 2 algorithms)

This paper contains 24 sections, 9 theorems, 32 equations, 2 algorithms.

Introduction
Background.
Data Stream Computation.
Submodular maximization.
Random order.
Our Results, Approach, and Related Work
Further Related Work
Random arrivals.
Coverage in streams.
Dynamic streams.
Lower bounds.
Preliminaries
Dynamic Streams
The Single Urn Process and its Analysis
Cascading Urns
...and 9 more sections

Key Result

Theorem 3.1

There is a one-pass algorithm that processes a stream of tokens $\langle x_1, \Delta_1\rangle, \langle x_2, \Delta_2\rangle, \ldots,$ where each $x_i\in \{1,\ldots, M\}$ and $\Delta_i\in \{-1,1\}$, using $O(\log^2(M) \log(1/\delta))$ bits of space, that, with probability $1-\delta$, returns an eleme

Theorems & Definitions (14)

Theorem 3.1: $\ell_0$ sampling JowhariST11
Theorem 3.2: Quantized Greedy Algorithm, e.g., McGregorV19
Theorem 3.3: Universe Subsampling
Lemma 4.1
Theorem 4.2
proof
Theorem 4.3
proof : Proof of \ref{['thm:urnrounds']}
Lemma 4.4
proof
...and 4 more

Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

TL;DR

Abstract

Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (14)