Data Informativeness in Linear Optimization under Uncertainty
Omar Bennouna, Amine Bennouna, Saurabh Amin, Asuman Ozdaglar
TL;DR
This paper develops a principled framework for task-aware data collection in linear optimization under uncertainty. It introduces a decision-focused notion of data informativeness and provides a complete geometric characterization of when a data set suffices to recover the optimal decision, linking sufficiency to the cost-directions that can alter the optimal solution. A tractable two-stage algorithm computes minimal sufficient data sets under general query constraints, with extensions to vector-space, convex, and more general uncertainty sets, including MIPs and noisy observations. Through applications to minimal-cost subway design and non-adaptive hiring interviews, it demonstrates that small, carefully chosen data sets can be highly informative for optimal decision-making. The framework offers insights into data collection design, illuminates computational trade-offs, and highlights natural extensions to broader problem classes and observation structures.
Abstract
We study the problem of determining what data is required to solve a decision-making task when only partial information about the state of the world is available. Focusing on linear programs, we introduce a decision-focused notion of data informativeness that formalizes when a data set is sufficient to recover the optimal decision. Our notion abstracts away the notion of estimators (how data is used): it depends solely on the structure of the optimization task and the uncertainty. Our main result provides a geometric characterization of data sufficiency: a data set is sufficient if and only if, together with prior knowledge, it captures all cost directions that can change the optimal solution, given the task structure and the uncertainty set. Building on our characterization, we develop a tractable algorithm to determine minimal sufficient data sets under general data collection constraints. Taken together, our work introduces a principled framework for task-aware data collection. We demonstrate the approach in two applications: selecting where to conduct field experiments to inform infrastructure design and choosing which candidates to interview in order to make an optimal hiring decision. Our results illustrate that small, carefully selected data sets often suffice to determine the optimal decisions.
