Table of Contents
Fetching ...

Weighted Set Multi-Cover on Bounded Universe and Applications in Package Recommendation

Nima Shahbazi, Aryan Esmailpour, Stavros Sintos

Abstract

The weighted set multi-cover problem is a fundamental generalization of set cover that arises in data-driven applications where one must select a small, low-cost subset from a large collection of candidates under coverage constraints. In data management settings, such problems arise naturally either as expressive database queries or as post-processing steps over query results, for example, when selecting representative or diverse subsets from large relations returned by database queries for decision support, recommendation, fairness-aware data selection, or crowd-sourcing. While the general weighted set multi-cover problem is NP-complete, many practical workloads involve a \emph{bounded universe} of items that must be covered, leading to the Weighted Set Multi-Cover with Bounded Universe (WSMC-BU) problem, where the universe size is constant. In this paper, we develop exact and approximation algorithms for WSMC-BU. We first discuss a dynamic programming algorithm that solves WSMC-BU exactly in $O(n^{\ell+1})$ time, where $n$ is the number of input sets and $\ell=O(1)$ is the universe size. We then present a $2$-approximation algorithm based on linear programming and rounding, running in $O(\mathcal{L}(n))$ time, where $\mathcal{L}(n)$ denotes the complexity of solving a linear program with $O(n)$ variables. To further improve efficiency for large datasets, we propose a faster $(2+\varepsilon)$-approximation algorithm with running time $O(n \log n + \mathcal{L}(\log W))$, where $W$ is the ratio of the total weight to the minimum weight, and $\varepsilon$ is an arbitrary constant specified by the user. Extensive experiments on real and synthetic datasets demonstrate that our methods consistently outperform greedy and standard LP-rounding baselines in both solution quality and runtime, making them suitable for data-intensive selection tasks over large query outputs.

Weighted Set Multi-Cover on Bounded Universe and Applications in Package Recommendation

Abstract

The weighted set multi-cover problem is a fundamental generalization of set cover that arises in data-driven applications where one must select a small, low-cost subset from a large collection of candidates under coverage constraints. In data management settings, such problems arise naturally either as expressive database queries or as post-processing steps over query results, for example, when selecting representative or diverse subsets from large relations returned by database queries for decision support, recommendation, fairness-aware data selection, or crowd-sourcing. While the general weighted set multi-cover problem is NP-complete, many practical workloads involve a \emph{bounded universe} of items that must be covered, leading to the Weighted Set Multi-Cover with Bounded Universe (WSMC-BU) problem, where the universe size is constant. In this paper, we develop exact and approximation algorithms for WSMC-BU. We first discuss a dynamic programming algorithm that solves WSMC-BU exactly in time, where is the number of input sets and is the universe size. We then present a -approximation algorithm based on linear programming and rounding, running in time, where denotes the complexity of solving a linear program with variables. To further improve efficiency for large datasets, we propose a faster -approximation algorithm with running time , where is the ratio of the total weight to the minimum weight, and is an arbitrary constant specified by the user. Extensive experiments on real and synthetic datasets demonstrate that our methods consistently outperform greedy and standard LP-rounding baselines in both solution quality and runtime, making them suitable for data-intensive selection tasks over large query outputs.
Paper Structure (41 sections, 10 theorems, 18 equations, 52 figures, 2 tables, 2 algorithms)

This paper contains 41 sections, 10 theorems, 18 equations, 52 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

There exists an exact algorithm for the WSMC-BU problem that runs in $O(n^{|\mathcal{G}|+1})$ time.

Figures (52)

  • Figure 1: Overview of crowd-sourced data enrichment using WSMC-BU.
  • Figure 2: Sample of experts with the same skill set. In this example, $H = \{Legal, Medical\}$, and the rate is considered as the weight. The figure shows the piecewise linear function $\hat{f}_H(x)$, based on the shown table.
  • Figure 3: Approximation of the $\mathscr{f}(x) = x^2 + 1$ curve by a piecewise linear function $\mathscr{g}(\cdot)$, generated by the described algorithm, where $\epsilon = 3$. The function $\mathscr{g}(\cdot)$ approximates the values of the function $\mathscr{f}(\cdot)$ within a factor of $4$.
  • Figure 4: The function $\hat{f}_{H_1}(x)$ is a piecewise linear function consisting of three solid segments. For $\varepsilon = 7$, the function $\hat{g}_{H_1}(x)$ is a piecewise linear function with two segments: for $x \in [0,1]$, $\hat{g}_{H_1}(x) = \hat{f}_{H_1}(x)$, while for $x \in [1,3]$, $\hat{g}_{H_1}(x)$ is given by the dashed linear segment connecting $(1,1)$ and $(3,8)$.
  • Figure 5: Impact of varying number of items $\ell$ on the solution’s total weight, Census
  • ...and 47 more figures

Theorems & Definitions (11)

  • Definition 1: Weighted Set Multi-Cover problem with Bounded Universe --- WSMC-BU
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 3
  • Lemma 5
  • Lemma 6
  • ...and 1 more