Reduced-Rank Multi-objective Policy Learning and Optimization
Ezinne Nwankwo, Michael I. Jordan, Angela Zhou
TL;DR
The paper tackles multi-objective causal policy learning under noisy, high-dimensional outcomes. It introduces reduced rank regression (RRR) to learn latent outcomes $Z(t)=B_tX$ and denoise observed outcomes via $\,Y(t) \approx A_tZ(t)$, enabling more reliable policy evaluation and optimization. A suite of estimators (RR-DM, RR-IPW, RR-CV) with control variates is developed, with theoretical guarantees and finite-sample generalization bounds. Empirical results on simulated data and a real Sahel poverty dataset show substantial variance reduction and improved policy performance, demonstrating practical impact for social programs. The framework offers a principled path to handle heterogeneity and multiple outcomes in policy design while highlighting ethical and interpretability considerations for real-world deployment.
Abstract
Evaluating the causal impacts of possible interventions is crucial for informing decision-making, especially towards improving access to opportunity. However, if causal effects are heterogeneous and predictable from covariates, personalized treatment decisions can improve individual outcomes and contribute to both efficiency and equity. In practice, however, causal researchers do not have a single outcome in mind a priori and often collect multiple outcomes of interest that are noisy estimates of the true target of interest. For example, in government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty. The ultimate goal is to learn an optimal treatment policy that in some sense maximizes multiple outcomes simultaneously. To address such issues, we present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning with multiple objectives. We learn a low-dimensional representation of the true outcome from the observed outcomes using reduced rank regression. We develop a suite of estimates that use the model to denoise observed outcomes, including commonly-used index weightings. These methods improve estimation error in policy evaluation and optimization, including on a case study of real-world cash transfer and social intervention data. Reducing the variance of noisy social outcomes can improve the performance of algorithmic allocations.
