What Data Enables Optimal Decisions? An Exact Characterization for Linear Optimization
Omar Bennouna, Amine Bennouna, Saurabh Amin, Asuman Ozdaglar
TL;DR
The paper addresses how informative a dataset is for solving a task-specific decision problem formulated as a linear program with cost uncertainty. It develops a sharp geometric characterization, showing that data sufficiency is equivalent to spanning the key directions $\Delta(\mathcal{X},\mathcal{C})$ or, equivalently, to the span of differences between reachable optimal solutions $\mathrm{dir}(\mathcal{X}^*(\mathcal{C}))$, and provides an algorithm to construct a minimal, sufficient dataset. By linking these geometric objects to practical data collection, the authors propose an iterative MILP-based method to build small datasets that preserve task-optimality and demonstrate the approach on a hiring-interviews scenario. The results offer a principled foundation for offline, task-aware data selection that can substantially reduce data-collection and computation costs while guaranteeing optimal decisions within the given uncertainty model.
Abstract
We study the fundamental question of how informative a dataset is for solving a given decision-making task. In our setting, the dataset provides partial information about unknown parameters that influence task outcomes. Focusing on linear programs, we characterize when a dataset is sufficient to recover an optimal decision, given an uncertainty set on the cost vector. Our main contribution is a sharp geometric characterization that identifies the directions of the cost vector that matter for optimality, relative to the task constraints and uncertainty set. We further develop a practical algorithm that, for a given task, constructs a minimal or least-costly sufficient dataset. Our results reveal that small, well-chosen datasets can often fully determine optimal decisions -- offering a principled foundation for task-aware data selection.
