Going from a Representative Agent to Counterfactuals in Combinatorial Choice
Yanqiu Ruan, Karthyek Murthy, Karthik Natarajan
TL;DR
This work addresses counterfactual prediction for decision data arising from combinatorial choices on binary polytopes by introducing the Separable Representative Agent Model (S-RAM), a nonparametric framework with separable convex perturbations. The key theoretical contribution is an exact, polynomial-time linear-programming characterization of SRAM representability over $0$-$1$ polytopes, enabling a practical consistency check via a lifted constraint set. Building on this, the authors develop a robust prediction pipeline that, when data are SRAM-consistent, computes worst- and best-case predictions for unseen polytopes and, when not, yields a compact MILP-based best-fit SRAM estimate. Extensive synthetic experiments on longest-path, shortest-path, and assignment problems demonstrate strong predictive accuracy, robustness to misspecification, and efficient solvability, highlighting SRAM’s ability to support counterfactual analysis in broad combinatorial environments.
Abstract
We study decision-making problems where data comprises points from a collection of binary polytopes, capturing aggregate information stemming from various combinatorial selection environments. We propose a nonparametric approach for counterfactual inference in this setting based on a representative agent model, where the available data is viewed as arising from maximizing separable concave utility functions over the respective binary polytopes. Our first contribution is to precisely characterize the selection probabilities representable under this model and show that verifying the consistency of any given aggregated selection dataset reduces to solving a polynomial-sized linear program. Building on this characterization, we develop a nonparametric method for counterfactual prediction. When data is inconsistent with the model, finding a best-fitting approximation for prediction reduces to solving a compact mixed-integer convex program. Numerical experiments based on synthetic data demonstrate the method's flexibility, predictive accuracy, and strong representational power even under model misspecification.
