Table of Contents
Fetching ...

Near-Optimal Sparsifiers for Stochastic Knapsack and Assignment Problems

Shaddin Dughmi, Yusuf Hakan Kalayci, Xinyu Liu

TL;DR

This work develops a polyhedral sparsification framework to address the data-access trade-off in stochastic packing problems, extending sparsification beyond matroid-like structures to knapsack-type constraints. The authors design near-optimal, non-adaptive sparsifiers for Knapsack, Multiple Knapsack, and Generalized Assignment Problems, achieving (1 - ε)-approximation with degree poly(1/ε, 1/p) that is independent of problem size. A key innovation is the reconstruction-based analysis that groups items into buckets and uses a charging argument to bound the impact of missing optimal items, even in cohort-dependent GAP settings with cross-knapsack interactions. Theoretical guarantees are complemented by empirical results in deterministic and synthetic contexts, showing substantial runtime speedups with minimal loss in objective value, and the work opens the question of extending sparsification to general ILPs with dimension-independent degree.

Abstract

When uncertainty meets costly information gathering, a fundamental question emerges: which data points should we probe to unlock near-optimal solutions? Sparsification of stochastic packing problems addresses this trade-off. The existing notions of sparsification measure the level of sparsity, called degree, as the ratio of queried items to the optimal solution size. While effective for matching and matroid-type problems with uniform structures, this cardinality-based approach fails for knapsack-type constraints where feasible sets exhibit dramatic structural variation. We introduce a polyhedral sparsification framework that measures the degree as the smallest scalar needed to embed the query set within a scaled feasibility polytope, naturally capturing redundancy without relying on cardinality. Our main contribution establishes that knapsack, multiple knapsack, and generalized assignment problems admit (1 - epsilon)-approximate sparsifiers with degree polynomial in 1/p and 1/epsilon -- where p denotes the independent activation probability of each element -- remarkably independent of problem dimensions. The key insight involves grouping items with similar weights and deploying a charging argument: when our query set misses an optimal item, we either substitute it with a queried item from the same group or leverage that group's excess contribution to compensate for the loss. This reveals an intriguing complexity-theoretic separation -- while the multiple knapsack problem lacks an FPTAS and generalized assignment is APX-hard, their sparsification counterparts admit efficient (1 - epsilon)-approximation algorithms that identify polynomial-degree query sets. Finally, we raise an open question: can such sparsification extend to general integer linear programs with degree independent of problem dimensions?

Near-Optimal Sparsifiers for Stochastic Knapsack and Assignment Problems

TL;DR

This work develops a polyhedral sparsification framework to address the data-access trade-off in stochastic packing problems, extending sparsification beyond matroid-like structures to knapsack-type constraints. The authors design near-optimal, non-adaptive sparsifiers for Knapsack, Multiple Knapsack, and Generalized Assignment Problems, achieving (1 - ε)-approximation with degree poly(1/ε, 1/p) that is independent of problem size. A key innovation is the reconstruction-based analysis that groups items into buckets and uses a charging argument to bound the impact of missing optimal items, even in cohort-dependent GAP settings with cross-knapsack interactions. Theoretical guarantees are complemented by empirical results in deterministic and synthetic contexts, showing substantial runtime speedups with minimal loss in objective value, and the work opens the question of extending sparsification to general ILPs with dimension-independent degree.

Abstract

When uncertainty meets costly information gathering, a fundamental question emerges: which data points should we probe to unlock near-optimal solutions? Sparsification of stochastic packing problems addresses this trade-off. The existing notions of sparsification measure the level of sparsity, called degree, as the ratio of queried items to the optimal solution size. While effective for matching and matroid-type problems with uniform structures, this cardinality-based approach fails for knapsack-type constraints where feasible sets exhibit dramatic structural variation. We introduce a polyhedral sparsification framework that measures the degree as the smallest scalar needed to embed the query set within a scaled feasibility polytope, naturally capturing redundancy without relying on cardinality. Our main contribution establishes that knapsack, multiple knapsack, and generalized assignment problems admit (1 - epsilon)-approximate sparsifiers with degree polynomial in 1/p and 1/epsilon -- where p denotes the independent activation probability of each element -- remarkably independent of problem dimensions. The key insight involves grouping items with similar weights and deploying a charging argument: when our query set misses an optimal item, we either substitute it with a queried item from the same group or leverage that group's excess contribution to compensate for the loss. This reveals an intriguing complexity-theoretic separation -- while the multiple knapsack problem lacks an FPTAS and generalized assignment is APX-hard, their sparsification counterparts admit efficient (1 - epsilon)-approximation algorithms that identify polynomial-degree query sets. Finally, we raise an open question: can such sparsification extend to general integer linear programs with degree independent of problem dimensions?

Paper Structure

This paper contains 77 sections, 13 theorems, 82 equations, 8 figures, 4 tables, 6 algorithms.

Key Result

Lemma 2

Let $S \subseteq E$ be a set of elements, each with weight $w_i$, such that where $\tau(\epsilon) := 1 + \ln(1/\epsilon) + \sqrt{ \ln^2(1/\epsilon) + 2 \ln(1/\epsilon) }.$ Then, if each item is active independently with probability $p$, the total weight of active items in $S$ is at least $C$ with probability at least $1 - \epsilon$.

Figures (8)

  • Figure 1: This figure presents experimental results for $n \in \{1000, 2000, 5000, 10000\}$. The upper panels illustrate speedup ratios (runtime of method A divided by runtime of method B) plotted against realized redundancy, defined as the ratio of total items to items in the optimal solution. Each gray point corresponds to an individual experiment, while the blue line represents the rolling median computed with a window size of $501$. The lower panels present violin plots showing the distribution of efficiency ratios (performance of method B divided by performance of method C) across varying levels of realized redundancy. Experiments are aggregated into discrete redundancy intervals (in $\log$-scale), with red dots marking the mean value and orange dots marking the median value within each bin. Sample sizes for each interval are indicated at the top of the corresponding violin plot, and the violin shapes visualize the complete distribution of efficiency ratios within each redundancy bin.
  • Figure 2: Visualization of substitution in FillLargeBucket($\overline{\mathbf{OPT}}, \mathbf{ALG}, j, k$). Assume $\alpha = 5$ and $T = [\alpha - 1] = 4$. For each $t \in T$, let $(i'_t, j'_t) \in \mathbf{OPT}$ be the selected potential substitution from bucket $\overline{B}_{j,k,t}$. The upper subfigure visualizes Value-Based Rejection case and the lower subfigure demonstrates Value-Based Substitution.
  • Figure 3: Visualization of substitution in FillSmallBucket($\overline{\mathbf{OPT}}, \mathbf{ALG}, j, 0$). Assume $\alpha = 5$. Each subfigure shows bucket $\overline{B}_{j,0}$, missed set $S^{\mathrm{missed}}_{j,0}$, and selected subset $S \subseteq \overline{B}_{j,0}$. Items are rectangles with width $w_{ij}$, height $v/w$, and area representing value. $\mathbf{ALG}$ substitutes $S$ for $S^{\mathrm{missed}}_{j,0}$ in knapsack $j$, recovering $\frac{4}{5} = 1 - \frac{1}{\alpha}$ of $\overline{\mathbf{OPT}}$'s value.
  • Figure 4: End-to-end speedup ($T_{\mathrm{full}}/T_{\mathrm{sparse}}$) versus realized redundancy $r=n/|\mathrm{OPT}_{\mathrm{full}}|$ for $n\in\{1000,2000,5000, 1000\}$. Each column fixes $n$ (labeled at the top). Gray points are individual runs; the blue curve is a rolling median computed with window size $501$. The $x$-axis is logarithmic and truncated at $r\le 50$ for readability, and the vertical scale is shared across columns.
  • Figure 5: Equal-time value ratio (ETR $=\mathop{\mathrm{OPT}}\nolimits_{\text{sparse}}/\mathop{\mathrm{OPT}}\nolimits_{\text{full}}^{(\text{cut})}$) versus realized redundancy $r=n/|\mathop{\mathrm{OPT}}\nolimits_{\text{full}}|$ for $n=1000$. Columns are $m\in\{1,2,5\}$; rows 1–7 are per-$\rho$ violin plots with medians (red) and means (orange), and the bottom row is a $\rho$-colored scatter. The $x$-axis is logarithmic and truncated at $r\le 50$ for readability; vertical lines mark equal-width bins in $\log_{10}$, with per-bin sample sizes annotated above each panel. For $m=1$ the $y$-axis is fixed near $1$; for $m\in\{2,5\}$ we clip outliers at the $0.99$ quantile and unify $y$-limits within the figure to enable cross-panel comparison.
  • ...and 3 more figures

Theorems & Definitions (27)

  • Definition 1: Sparsifier
  • Lemma 2: Activation Weight Concentration
  • Theorem 3: Knapsack Sparsifier Performance
  • proof
  • Theorem 4: GAP Sparsifier
  • Corollary 5: Multiple Knapsack Sparsifier
  • Corollary 6
  • Lemma 7: Feasibility in Large Buckets
  • Lemma 8: Feasibility in Small Buckets
  • Lemma 9: Feasibility in Super Buckets
  • ...and 17 more