Table of Contents
Fetching ...

What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization

Omar Bennouna, Amine Bennouna, Saurabh Amin, Asuman Ozdaglar

TL;DR

The paper addresses how informative a dataset is for solving a task-specific decision problem formulated as a linear program with cost uncertainty. It develops a sharp geometric characterization, showing that data sufficiency is equivalent to spanning the key directions $\Delta(\mathcal{X},\mathcal{C})$ or, equivalently, to the span of differences between reachable optimal solutions $\mathrm{dir}(\mathcal{X}^*(\mathcal{C}))$, and provides an algorithm to construct a minimal, sufficient dataset. By linking these geometric objects to practical data collection, the authors propose an iterative MILP-based method to build small datasets that preserve task-optimality and demonstrate the approach on a hiring-interviews scenario. The results offer a principled foundation for offline, task-aware data selection that can substantially reduce data-collection and computation costs while guaranteeing optimal decisions within the given uncertainty model.

Abstract

We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection.

What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization

TL;DR

The paper addresses how informative a dataset is for solving a task-specific decision problem formulated as a linear program with cost uncertainty. It develops a sharp geometric characterization, showing that data sufficiency is equivalent to spanning the key directions or, equivalently, to the span of differences between reachable optimal solutions , and provides an algorithm to construct a minimal, sufficient dataset. By linking these geometric objects to practical data collection, the authors propose an iterative MILP-based method to build small datasets that preserve task-optimality and demonstrate the approach on a hiring-interviews scenario. The results offer a principled foundation for offline, task-aware data selection that can substantially reduce data-collection and computation costs while guaranteeing optimal decisions within the given uncertainty model.

Abstract

We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection.

Paper Structure

This paper contains 26 sections, 17 theorems, 42 equations, 2 figures, 4 algorithms.

Key Result

Proposition 1

Let $\mathcal{C}$ be an open convex set and $\mathcal{D}:=\{q_1,\dots,q_N\}$ a dataset. The following are equivalent:

Figures (2)

  • Figure 1: Optimality cones relative to $\mathcal{X}$ (left), relative to the origin (middle) and examples of the uncertainty sets ($\mathcal{C}$ and $\mathcal{C}'$) relative to the optimality cones (right).
  • Figure 2: Candidates to be interviewed (in red) to make an optimal hiring decision. Number of candidates to interview from left to right for top and bottom row respectively: $8,24,31,52$ and $8,28,43,70$.

Theorems & Definitions (40)

  • Definition 1: Sufficient Decision Dataset
  • Proposition 1: One vs All Optimal Solutions
  • Proposition 2
  • Proposition 3: Noisy Observations
  • Proposition 4
  • Definition 2: Extreme Points
  • Proposition 5: Feasible and Extreme Directions
  • Proposition 6: Optimality Cones
  • Definition 3: Relevant Extreme Directions
  • Theorem 1
  • ...and 30 more