Table of Contents
Fetching ...

Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads

Todd Morrill, Aahlad Puli, Murad Megjhani, Soojin Park, Richard Zemel

TL;DR

The study addresses survival analysis under right-censoring by aiming to improve calibration and predictive accuracy while enabling clustering of patients. It introduces three discrete-time deep mixture-of-experts heads—Fixed MoE, Adjustable MoE, and Personalized MoE—trained with a multitask logistic regression loss over a discrete time grid of $m=100$ time bins. Across Survival MNIST (synthetic) and real-world datasets SUPPORT2 and Sepsis, the Personalized MoE achieves best calibration and competitive discrimination and Brier scores, with performance gains amplified by higher expert expressivity. The approach surfaces clinically meaningful patient groups through routing analyses, remains robust to the number of experts, and offers a practical framework for reasoning by analogy to similar patients in clinical decision support systems.

Abstract

Deep mixture-of-experts models have attracted a lot of attention for survival analysis problems, particularly for their ability to cluster similar patients together. In practice, grouping often comes at the expense of key metrics such as calibration error and predictive accuracy. This is due to the restrictive inductive bias that mixture-of-experts imposes, that predictions for individual patients must look like predictions for the group they're assigned to. Might we be able to discover patient group structure, where it exists, while improving calibration and predictive accuracy? In this work, we introduce several discrete-time deep mixture-of-experts (MoE)-based architectures for survival analysis problems, one of which achieves all desiderata: clustering, calibration, and predictive accuracy. We show that a key differentiator between this array of MoEs is how expressive their experts are. We find that more expressive experts that tailor predictions per patient outperform experts that rely on fixed group prototypes.

Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads

TL;DR

The study addresses survival analysis under right-censoring by aiming to improve calibration and predictive accuracy while enabling clustering of patients. It introduces three discrete-time deep mixture-of-experts heads—Fixed MoE, Adjustable MoE, and Personalized MoE—trained with a multitask logistic regression loss over a discrete time grid of time bins. Across Survival MNIST (synthetic) and real-world datasets SUPPORT2 and Sepsis, the Personalized MoE achieves best calibration and competitive discrimination and Brier scores, with performance gains amplified by higher expert expressivity. The approach surfaces clinically meaningful patient groups through routing analyses, remains robust to the number of experts, and offers a practical framework for reasoning by analogy to similar patients in clinical decision support systems.

Abstract

Deep mixture-of-experts models have attracted a lot of attention for survival analysis problems, particularly for their ability to cluster similar patients together. In practice, grouping often comes at the expense of key metrics such as calibration error and predictive accuracy. This is due to the restrictive inductive bias that mixture-of-experts imposes, that predictions for individual patients must look like predictions for the group they're assigned to. Might we be able to discover patient group structure, where it exists, while improving calibration and predictive accuracy? In this work, we introduce several discrete-time deep mixture-of-experts (MoE)-based architectures for survival analysis problems, one of which achieves all desiderata: clustering, calibration, and predictive accuracy. We show that a key differentiator between this array of MoEs is how expressive their experts are. We find that more expressive experts that tailor predictions per patient outperform experts that rely on fixed group prototypes.

Paper Structure

This paper contains 65 sections, 42 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Illustration of our three proposed mixture-of-experts (MoE) architectures for survival analysis. Left - Fixed MoE showing the same expert distributions for patients A and B with differing expert weights. Middle - Adjustable MoE where fixed expert distributions (faint color) are adjusted per patient (dark color). Right - Personalized MoE where all experts produce a custom event distribution for all patients.
  • Figure 2: Event distributions for each digit in the Survival MNIST dataset. Each digit has a distinct event distribution, which allows us to evaluate the ability of our models to recover latent groups.
  • Figure 3: Expert sensitivity analysis over 5 random seeds varying the number of experts. Test set loss as a function of the number of experts.
  • Figure 4: MNIST digit clustering by expert in (a) a Fixed MoE model and (b) a Personalized MoE model.
  • Figure 5: Routing of 911 unseen patients in the SUPPORT2 dataset to experts in a Personalized MoE model. (a) shows the group sizes (by Top-1 expert activation), (b) shows the survival curves for patients routed to each expert, and (c) shows the age distribution of patients in each cluster. The Personalized MoE model is able to discover clinically meaningful patient groups in real-world datasets by risk-profile and patient attributes.
  • ...and 4 more figures