Table of Contents
Fetching ...

Matchings, Predictions and Counterfactual Harm in Refugee Resettlement Processes

Seungeon Lee, Nina Corvelo Benz, Suhas Thejaswi, Manuel Gomez-Rodriguez

TL;DR

This work tackles the problem that data-driven refugee placement using predicted employment probabilities can be counterfactually harmful relative to a historic default policy. It builds a structural causal model of resettlement, derives counterfactual guarantees for harm-free policy classes, and introduces a practical post-processing framework that minimally adjusts predicted probabilities via inverse matching; it further trains a Transformer to generalize these adjustments to unseen pools. The key contributions are a formal counterfactual-harm framework with guarantees, an inverse matching LP to compute minimally modified predictions, and a Transformer-based method to approximate these corrections for new data, all validated on synthetic data showing reduced harm and improved counterfactual utility. The approach offers a harm-aware path for deploying algorithmic matching in high-stakes settings and could extend to health care, donor allocation, or conference peer-review assignments.

Abstract

Resettlement agencies have started to adopt data-driven algorithmic matching to match refugees to locations using employment rate as a measure of utility. Given a pool of refugees, data-driven algorithmic matching utilizes a classifier to predict the probability that each refugee would find employment at any given location. Then, it uses the predicted probabilities to estimate the expected utility of all possible placement decisions. Finally, it finds the placement decisions that maximize the predicted utility by solving a maximum weight bipartite matching problem. In this work, we argue that, using existing solutions, there may be pools of refugees for which data-driven algorithmic matching is (counterfactually) harmful -- it would have achieved lower utility than a given default policy used in the past, had it been used. Then, we develop a post-processing algorithm that, given placement decisions made by a default policy on a pool of refugees and their employment outcomes, solves an inverse~matching problem to minimally modify the predictions made by a given classifier. Under these modified predictions, the optimal matching policy that maximizes predicted utility on the pool is guaranteed to be not harmful. Further, we introduce a Transformer model that, given placement decisions made by a default policy on multiple pools of refugees and their employment outcomes, learns to modify the predictions made by a classifier so that the optimal matching policy that maximizes predicted utility under the modified predictions on an unseen pool of refugees is less likely to be harmful than under the original predictions. Experiments on simulated resettlement processes using synthetic refugee data created from a variety of publicly available data suggest that our methodology may be effective in making algorithmic placement decisions that are less likely to be harmful than existing solutions.

Matchings, Predictions and Counterfactual Harm in Refugee Resettlement Processes

TL;DR

This work tackles the problem that data-driven refugee placement using predicted employment probabilities can be counterfactually harmful relative to a historic default policy. It builds a structural causal model of resettlement, derives counterfactual guarantees for harm-free policy classes, and introduces a practical post-processing framework that minimally adjusts predicted probabilities via inverse matching; it further trains a Transformer to generalize these adjustments to unseen pools. The key contributions are a formal counterfactual-harm framework with guarantees, an inverse matching LP to compute minimally modified predictions, and a Transformer-based method to approximate these corrections for new data, all validated on synthetic data showing reduced harm and improved counterfactual utility. The approach offers a harm-aware path for deploying algorithmic matching in high-stakes settings and could extend to health care, donor allocation, or conference peer-review assignments.

Abstract

Resettlement agencies have started to adopt data-driven algorithmic matching to match refugees to locations using employment rate as a measure of utility. Given a pool of refugees, data-driven algorithmic matching utilizes a classifier to predict the probability that each refugee would find employment at any given location. Then, it uses the predicted probabilities to estimate the expected utility of all possible placement decisions. Finally, it finds the placement decisions that maximize the predicted utility by solving a maximum weight bipartite matching problem. In this work, we argue that, using existing solutions, there may be pools of refugees for which data-driven algorithmic matching is (counterfactually) harmful -- it would have achieved lower utility than a given default policy used in the past, had it been used. Then, we develop a post-processing algorithm that, given placement decisions made by a default policy on a pool of refugees and their employment outcomes, solves an inverse~matching problem to minimally modify the predictions made by a given classifier. Under these modified predictions, the optimal matching policy that maximizes predicted utility on the pool is guaranteed to be not harmful. Further, we introduce a Transformer model that, given placement decisions made by a default policy on multiple pools of refugees and their employment outcomes, learns to modify the predictions made by a classifier so that the optimal matching policy that maximizes predicted utility under the modified predictions on an unseen pool of refugees is less likely to be harmful than under the original predictions. Experiments on simulated resettlement processes using synthetic refugee data created from a variety of publicly available data suggest that our methodology may be effective in making algorithmic placement decisions that are less likely to be harmful than existing solutions.
Paper Structure (23 sections, 6 theorems, 47 equations, 5 figures, 6 tables)

This paper contains 23 sections, 6 theorems, 47 equations, 5 figures, 6 tables.

Key Result

Proposition 3.1

For any $\bm{x} \sim P^{\mathcal{M}}(\bm{X})$, if $g_{l}(x_i) = P^{\mathcal{M} \,;\, do(L_i = l)}(Y_i = 1 \,|\, X_i = x_i)$ for all $l \in \mathcal{L}$ and $i \in \mathcal{I}$, then, for any $\hat{\pi}(\bm{g}) \in \hat{\Pi}(\bm{g})$, it holds that $\hat{\pi}(\bm{g}) \in \Pi^{\ast}(\bm{g})$.

Figures (5)

  • Figure 1: Our structural causal model $\mathcal{M}$. Circles represent endogenous random variables and boxes represent exogenous random variables. The value of each endogenous variable is given by a function of the values of its ancestors in the structural causal model, as defined by Eq. \ref{['eq:scm']}. The value of each exogenous variable is sampled independently from a given distribution.
  • Figure 2: Per-pool expected counterfactual utility gain achieved by the proposed algorithmic policies $\hat{\pi}(\breve{\bm{g}})$ and $\hat{\pi}(h(\bm{g}))$ with respect to the policy $\hat{\pi}(\bm{g})$ in the test set for $\beta = 0.6$ under low, medium and high noise level. The cross markers indicate the expected counterfactual utility gain across all pools in the test set. Pools above the (dashed) identity line (i.e., $y=x$) show increase in counterfactual utility gain compared to policy $\hat{\pi}(\bm{g})$.
  • Figure 3: Expected counterfactual utility achieved by the algorithmic policies $\hat{\pi}(\bm{p})$, $\hat{\pi}(\bm{g})$ and $\hat{\pi}(h(\bm{g}))$ in comparison with the expected realized utility achieved by the default policy $\tilde{\pi}(\bm{x}, w)$ across all pools in the test set for different $\beta$ values under low, medium and high noise level. For $\hat{\pi}(h(\bm{g}))$, the results are averaged over $5$ runs, where the error bands represent standard deviations.
  • Figure 4: Empirical distribution of employment probability of refugees for all states, computed using $500{,}000$ synthesized refugees. The distributions are plotted using exponential binning.
  • Figure 5: Per-refugee employment probability for two pairs of locations and $500,000$ synthesized refugees. The (dashed) identity line (i.e., $y=x$) indicates equal probabilities between two states.

Theorems & Definitions (6)

  • Proposition 3.1
  • Proposition 4.1
  • Proposition 4.2
  • Theorem 5.1
  • Lemma A.1
  • Proposition A.2