Near-Optimal Sparsifiers for Stochastic Knapsack and Assignment Problems
Shaddin Dughmi, Yusuf Hakan Kalayci, Xinyu Liu
TL;DR
This work develops a polyhedral sparsification framework to address the data-access trade-off in stochastic packing problems, extending sparsification beyond matroid-like structures to knapsack-type constraints. The authors design near-optimal, non-adaptive sparsifiers for Knapsack, Multiple Knapsack, and Generalized Assignment Problems, achieving (1 - ε)-approximation with degree poly(1/ε, 1/p) that is independent of problem size. A key innovation is the reconstruction-based analysis that groups items into buckets and uses a charging argument to bound the impact of missing optimal items, even in cohort-dependent GAP settings with cross-knapsack interactions. Theoretical guarantees are complemented by empirical results in deterministic and synthetic contexts, showing substantial runtime speedups with minimal loss in objective value, and the work opens the question of extending sparsification to general ILPs with dimension-independent degree.
Abstract
When uncertainty meets costly information gathering, a fundamental question emerges: which data points should we probe to unlock near-optimal solutions? Sparsification of stochastic packing problems addresses this trade-off. The existing notions of sparsification measure the level of sparsity, called degree, as the ratio of queried items to the optimal solution size. While effective for matching and matroid-type problems with uniform structures, this cardinality-based approach fails for knapsack-type constraints where feasible sets exhibit dramatic structural variation. We introduce a polyhedral sparsification framework that measures the degree as the smallest scalar needed to embed the query set within a scaled feasibility polytope, naturally capturing redundancy without relying on cardinality. Our main contribution establishes that knapsack, multiple knapsack, and generalized assignment problems admit (1 - epsilon)-approximate sparsifiers with degree polynomial in 1/p and 1/epsilon -- where p denotes the independent activation probability of each element -- remarkably independent of problem dimensions. The key insight involves grouping items with similar weights and deploying a charging argument: when our query set misses an optimal item, we either substitute it with a queried item from the same group or leverage that group's excess contribution to compensate for the loss. This reveals an intriguing complexity-theoretic separation -- while the multiple knapsack problem lacks an FPTAS and generalized assignment is APX-hard, their sparsification counterparts admit efficient (1 - epsilon)-approximation algorithms that identify polynomial-degree query sets. Finally, we raise an open question: can such sparsification extend to general integer linear programs with degree independent of problem dimensions?
