Table of Contents
Fetching ...

Data Informativeness in Linear Optimization under Uncertainty

Omar Bennouna, Amine Bennouna, Saurabh Amin, Asuman Ozdaglar

TL;DR

This paper develops a principled framework for task-aware data collection in linear optimization under uncertainty. It introduces a decision-focused notion of data informativeness and provides a complete geometric characterization of when a data set suffices to recover the optimal decision, linking sufficiency to the cost-directions that can alter the optimal solution. A tractable two-stage algorithm computes minimal sufficient data sets under general query constraints, with extensions to vector-space, convex, and more general uncertainty sets, including MIPs and noisy observations. Through applications to minimal-cost subway design and non-adaptive hiring interviews, it demonstrates that small, carefully chosen data sets can be highly informative for optimal decision-making. The framework offers insights into data collection design, illuminates computational trade-offs, and highlights natural extensions to broader problem classes and observation structures.

Abstract

We study the problem of determining what data is required to solve a decision-making task when only partial information about the state of the world is available. Focusing on linear programs, we introduce a decision-focused notion of data informativeness that formalizes when a data set is sufficient to recover the optimal decision. Our notion abstracts away the notion of estimators (how data is used): it depends solely on the structure of the optimization task and the uncertainty. Our main result provides a geometric characterization of data sufficiency: a data set is sufficient if and only if, together with prior knowledge, it captures all cost directions that can change the optimal solution, given the task structure and the uncertainty set. Building on our characterization, we develop a tractable algorithm to determine minimal sufficient data sets under general data collection constraints. Taken together, our work introduces a principled framework for task-aware data collection. We demonstrate the approach in two applications: selecting where to conduct field experiments to inform infrastructure design and choosing which candidates to interview in order to make an optimal hiring decision. Our results illustrate that small, carefully selected data sets often suffice to determine the optimal decisions.

Data Informativeness in Linear Optimization under Uncertainty

TL;DR

This paper develops a principled framework for task-aware data collection in linear optimization under uncertainty. It introduces a decision-focused notion of data informativeness and provides a complete geometric characterization of when a data set suffices to recover the optimal decision, linking sufficiency to the cost-directions that can alter the optimal solution. A tractable two-stage algorithm computes minimal sufficient data sets under general query constraints, with extensions to vector-space, convex, and more general uncertainty sets, including MIPs and noisy observations. Through applications to minimal-cost subway design and non-adaptive hiring interviews, it demonstrates that small, carefully chosen data sets can be highly informative for optimal decision-making. The framework offers insights into data collection design, illuminates computational trade-offs, and highlights natural extensions to broader problem classes and observation structures.

Abstract

We study the problem of determining what data is required to solve a decision-making task when only partial information about the state of the world is available. Focusing on linear programs, we introduce a decision-focused notion of data informativeness that formalizes when a data set is sufficient to recover the optimal decision. Our notion abstracts away the notion of estimators (how data is used): it depends solely on the structure of the optimization task and the uncertainty. Our main result provides a geometric characterization of data sufficiency: a data set is sufficient if and only if, together with prior knowledge, it captures all cost directions that can change the optimal solution, given the task structure and the uncertainty set. Building on our characterization, we develop a tractable algorithm to determine minimal sufficient data sets under general data collection constraints. Taken together, our work introduces a principled framework for task-aware data collection. We demonstrate the approach in two applications: selecting where to conduct field experiments to inform infrastructure design and choosing which candidates to interview in order to make an optimal hiring decision. Our results illustrate that small, carefully selected data sets often suffice to determine the optimal decisions.
Paper Structure (58 sections, 28 theorems, 65 equations, 4 figures, 1 table, 8 algorithms)

This paper contains 58 sections, 28 theorems, 65 equations, 4 figures, 1 table, 8 algorithms.

Key Result

Proposition 1

Let $\cC$ be an open convex set and $\cD:=\{q_1,\dots,q_N\}$ a data set. The following are equivalent:

Figures (4)

  • Figure 1: Left: Optimality cones relative to $\cX$: each extreme point has an associated cone of cost vectors making it optimal. Middle: The same cones at the origin, illustrating their dual relationship with feasible directions. Right: Uncertainty sets $\cC$ and $\cC'$ intersecting different cone collections. Data must distinguish between cones that overlap with the uncertainty set.
  • Figure 2: US neighborhood graph map with origin in green and destination in red (left). $c_0$ is taken as the edge lengths, and the corresponding shortest path is in orange (right).
  • Figure 3: Minimal edges to observe (in magenta) to find an optimal solution for values of $\epsilon=7\%, 30\%,\; 99\%$ (left to right).
  • Figure 4: Candidates to be interviewed (in red) to make an optimal hiring decision. Number of candidates to interview from left to right for top and bottom row respectively: $8,24,31,52$ and $8,28,43,70$.

Theorems & Definitions (65)

  • Definition 1
  • Proposition 1: One vs All Optimal Solutions
  • Remark 1: Extensions and Open Problems
  • Proposition 2
  • Proposition 3
  • Definition 2: Extreme Points
  • Proposition 4: Feasible and Extreme Directions
  • Proposition 5: Optimality Cones
  • Definition 3: Relevant Extreme Directions
  • Definition 4: Linear and Affine Hulls
  • ...and 55 more