Table of Contents
Fetching ...

Learning Decision-Sufficient Representations for Linear Optimization

Yuhan Ye, Saurabh Amin, Asuman Ozdaglar

Abstract

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.

Learning Decision-Sufficient Representations for Linear Optimization

Abstract

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector lying in a prior set . Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension . However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most . This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most , and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as rather than , where is the ambient cost dimension.
Paper Structure (71 sections, 33 theorems, 164 equations, 2 figures, 4 algorithms)

This paper contains 71 sections, 33 theorems, 164 equations, 2 figures, 4 algorithms.

Key Result

Theorem 1

Let $\mathcal{C}$ be open and convex. A dataset $\mathcal{D}$ is an SDD for $(\mathcal{X},\mathcal{C})$ if and only if

Figures (2)

  • Figure 1: Stage I. Learned dimension $t=\dim(\hat{W})$ (mean over $10$ trials).
  • Figure 2: Stage II. SPO risk vs. number of labeled samples (mean $\pm$$90\%$ CIs over $10$ trials).

Theorems & Definitions (70)

  • Definition 1: Global sufficient decision dataset
  • Theorem 1: bennouna2025whatdata, Theorem 1
  • Theorem 2: bennouna2025whatdata, Theorem 2
  • Corollary 1: Subspace characterization bennouna2025whatdata, Corollary 1
  • Theorem 3: Informal
  • Definition 2: Pointwise sufficient decision dataset
  • Remark 1
  • proof : Proof of Property \ref{['prop:pointwise-basic']}(i)
  • proof : Proof of Property \ref{['prop:pointwise-basic']}(ii)
  • Theorem 4: Informal
  • ...and 60 more