Learning Potentials for Dynamic Matching and Application to Heart Transplantation

Itai Zilberstein; Ioannis Anagnostides; Zachary W. Sollie; Arman Kilic; Tuomas Sandholm

Learning Potentials for Dynamic Matching and Application to Heart Transplantation

Itai Zilberstein, Ioannis Anagnostides, Zachary W. Sollie, Arman Kilic, Tuomas Sandholm

TL;DR

This work tackles dynamic heart transplant allocation by framing it as online bipartite matching and introducing potential-based policies that balance immediate utility with long-term waitlist value. It replaces prior black-box optimization with an offline imitation-learning framework, training expressive neural-network potentials by mimicking a hindsight omniscient oracle and reinforcing learning with semi-synthetic data. The approach yields substantial improvements over the US status quo, CAS, and myopic baselines on real UNOS data, achieving around 95% of the omniscient upper bound and demonstrating the value of incorporating waitlist state into decision-making. The contributions offer a scalable path toward more effective organ allocation while highlighting trade-offs in fairness and the need for equity-aware extensions in future work.

Abstract

Each year, thousands of patients in need of heart transplants face life-threatening wait times due to organ scarcity. While allocation policies aim to maximize population-level outcomes, current approaches often fail to account for the dynamic arrival of organs and the composition of waitlisted candidates, thereby hampering efficiency. The United States is transitioning from rigid, rule-based allocation to more flexible data-driven models. In this paper, we propose a novel framework for non-myopic policy optimization in general online matching relying on potentials, a concept originally introduced for kidney exchange. We develop scalable and accurate ways of learning potentials that are higher-dimensional and more expressive than prior approaches. Our approach is a form of self-supervised imitation learning: the potentials are trained to mimic an omniscient algorithm that has perfect foresight. We focus on the application of heart transplant allocation and demonstrate, using real historical data, that our policies significantly outperform prior approaches -- including the current US status quo policy and the proposed continuous distribution framework -- in optimizing for population-level outcomes. Our analysis and methods come at a pivotal moment in US policy, as the current heart transplant allocation system is under review. We propose a scalable and theoretically grounded path toward more effective organ allocation.

Learning Potentials for Dynamic Matching and Application to Heart Transplantation

TL;DR

Abstract

Paper Structure (36 sections, 3 theorems, 21 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 3 theorems, 21 equations, 6 figures, 9 tables, 1 algorithm.

Introduction
Our contributions
Additional related work
Organ allocation
Imitation learning and learning-to-rank
Online matching
Our heart transplant allocation model
Potential-based matching
General policy framework
Linear potential functions
Neural network-based potential functions
Learning the potentials
Black-box optimization
Our imitation learning framework
Omniscient algorithm
...and 21 more sections

Key Result

Proposition 1

Let $D'$ and $P'$ be the random variables for the number of donors and patients in a semi-synthetic trajectory. The expected volumes are the midpoints of the observed historical range:

Figures (6)

Figure 1: Schematic illustration of online heart transplant allocation.
Figure 2: Example motivating non-myopic decision making in heart transplant allocation. The edge weights denote the immediate utility of a patient-donor match. A myopic algorithm would obtain a suboptimal utility of 10 by matching Patient 1 with Donor 1. The optimal decisions are non-myopic: match Patient 2 with Donor 1 and Patient 1 with Donor 2.
Figure 3: Learning pipelines using a set of trajectories. (a) SMAC uses the simulator as a black box, optimizing the average utility across trajectories in a feedback loop. (b) Imitation learning approaches first compute the optimal patient-donor allocations in the training set, $X_1,..., X_N$, using the omniscient algorithm for each trajectory, and then aggregate these decisions into a single dataset. The approaches then learn the potential function weights $\theta$ in a one-shot process.
Figure 4: Generation of a semi-synthetic trajectory from a single real trajectory.
Figure 5: Average waitlist mortality rates normalized by number of patients in each group (a) and cumulative years on the waiting list per group (b).
...and 1 more figures

Theorems & Definitions (3)

Proposition 1: Volume consistency
Proposition 2: Feature invariance
Proposition 3: Waitlist structure preservation

Learning Potentials for Dynamic Matching and Application to Heart Transplantation

TL;DR

Abstract

Learning Potentials for Dynamic Matching and Application to Heart Transplantation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)