Reduced-Rank Multi-objective Policy Learning and Optimization

Ezinne Nwankwo; Michael I. Jordan; Angela Zhou

Reduced-Rank Multi-objective Policy Learning and Optimization

Ezinne Nwankwo, Michael I. Jordan, Angela Zhou

TL;DR

The paper tackles multi-objective causal policy learning under noisy, high-dimensional outcomes. It introduces reduced rank regression (RRR) to learn latent outcomes $Z(t)=B_tX$ and denoise observed outcomes via $\,Y(t) \approx A_tZ(t)$, enabling more reliable policy evaluation and optimization. A suite of estimators (RR-DM, RR-IPW, RR-CV) with control variates is developed, with theoretical guarantees and finite-sample generalization bounds. Empirical results on simulated data and a real Sahel poverty dataset show substantial variance reduction and improved policy performance, demonstrating practical impact for social programs. The framework offers a principled path to handle heterogeneity and multiple outcomes in policy design while highlighting ethical and interpretability considerations for real-world deployment.

Abstract

Evaluating the causal impacts of possible interventions is crucial for informing decision-making, especially towards improving access to opportunity. However, if causal effects are heterogeneous and predictable from covariates, personalized treatment decisions can improve individual outcomes and contribute to both efficiency and equity. In practice, however, causal researchers do not have a single outcome in mind a priori and often collect multiple outcomes of interest that are noisy estimates of the true target of interest. For example, in government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty. The ultimate goal is to learn an optimal treatment policy that in some sense maximizes multiple outcomes simultaneously. To address such issues, we present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning with multiple objectives. We learn a low-dimensional representation of the true outcome from the observed outcomes using reduced rank regression. We develop a suite of estimates that use the model to denoise observed outcomes, including commonly-used index weightings. These methods improve estimation error in policy evaluation and optimization, including on a case study of real-world cash transfer and social intervention data. Reducing the variance of noisy social outcomes can improve the performance of algorithmic allocations.

Reduced-Rank Multi-objective Policy Learning and Optimization

TL;DR

The paper tackles multi-objective causal policy learning under noisy, high-dimensional outcomes. It introduces reduced rank regression (RRR) to learn latent outcomes

and denoise observed outcomes via

, enabling more reliable policy evaluation and optimization. A suite of estimators (RR-DM, RR-IPW, RR-CV) with control variates is developed, with theoretical guarantees and finite-sample generalization bounds. Empirical results on simulated data and a real Sahel poverty dataset show substantial variance reduction and improved policy performance, demonstrating practical impact for social programs. The framework offers a principled path to handle heterogeneity and multiple outcomes in policy design while highlighting ethical and interpretability considerations for real-world deployment.

Abstract

Paper Structure (30 sections, 12 theorems, 50 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 12 theorems, 50 equations, 12 figures, 8 tables, 1 algorithm.

Introduction
Problem Setup and Related Work
Estimation in causal inference
Reduced Rank Regression for Potential Outcomes
Multi-objective evaluation and learning
Methodology
Estimators
Deriving a control variate estimator
Analysis
Experiments
Simulated data
Latent Outcome Estimation Reduces Variance in Off-Policy Evaluation
Real world case study: "Sahel" dataset, poverty graduation program
Conclusion
Impact Statement
...and 15 more sections

Key Result

Lemma 1

Under asn:two-outcome-model, the policy value with the direct method estimator (for $\hat{Z}$ and $\hat{Y}$) is unbiased:

Figures (12)

Figure 1: Variance in ATE estimation. Left: Comparison of variances averaged over 100 datasets as the sample size of the dataset increases using the direct estimators for $\rho Y$. Right: Comparison of variances as the sample size increases for the IPW and control variate estimators. Lower is better.
Figure 2: Policy evaluation experiment: comparing variances of the policy value for each estimator averaged over 100 datasets. We compute each policy value by using the optimal policies learned under the true latent outcomes $\rho^\top A_tZ$. Lower is better.
Figure 3: Policy optimization experiment: left figures illustrate variance of out-of-sample policy value estimate (averaged over 50 datasets). Right figures compare the log MSE for policy value suboptimality.Lower is better. Top row compares $\rho^\top \hat{A} \hat{Z}$ (reduced rank DM) with full-rank DM. Middle row compares standard IPW with our denoised IPW, and bottom row compares standard doubly robust estimator ($\rho^\top Y$-DR) with our control variate estimator.
Figure 4: Left: Comparison of variances averaged over 100 datasets as the noise in $Y$ increases using the direct estimators for $\rho Y$. Right: Comparison of variances as the noise level in $Y$ increases for the IPW and control variate estimators. Lower is better.
Figure 5: First Left: Comparison of variances averaged over 100 datasets as the dimensions of $Y$ and $Z$ increase using the direct estimators for $\rho Y$. The ratio of dimensions is $d_Y/d_Z$ where the dimensions of $Y$ are always greater than $Z$. Second Left: Comparison of variances as the ratio of dimensions of $Y$ to $Z$ is increasing for the IPW and control variate estimators. First Right: Comparison of variances averaged over 100 datasets as the dimensions of $Z$ increase and the dimension of $Y$ remains fixed at $k=50$ for $\rho Y$. Second Right: Comparison of variances as the dimensions of $Z$ increase and the dimensions of $Y$ remain fixed for the IPW and control variate estimators. Lower is better.
...and 7 more figures

Theorems & Definitions (13)

Lemma 1: Unbiasedness of DM Estimator
Lemma 2: Unbiasedness of IPW Estimator
Definition 1: Outcome Control Variates
Lemma 3: Unbiasedness of CV Estimator
Proposition 1: Consistency in OLS with Noisy Outcomes
Theorem 1
Theorem 2
Lemma 4: Proposition 15 from bunea2011rrroptimal
Lemma 5: Error bounds for the RRR rank selection criterion estimator (bunea2011rrroptimal, Theorem 5)
Lemma 6: Bernstein's inequality
...and 3 more

Reduced-Rank Multi-objective Policy Learning and Optimization

TL;DR

Abstract

Reduced-Rank Multi-objective Policy Learning and Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (13)