Table of Contents
Fetching ...

Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis

Hyeonjun Lee, Hyungseob Shin, Gunhee Nam, Hyeonsoo Lee

TL;DR

The paper tackles non-proportional hazards and patient heterogeneity in discrete-time survival analysis by introducing a dual mixture-of-experts framework that jointly learns a Mixture of Feature Encoders for subgroup-aware representations and a time-conditioned Mixture of Hazard Networks for flexible, time-varying risk. The model maximizes the discrete-time likelihood while enforcing balanced expert usage through load-balancing regularizers on both encoder and hazard communities. Empirical results on METABRIC and GBSG breast cancer datasets show consistent improvements in both overall and time-dependent C-index, with additional gains when integrated into the ConSurv framework, highlighting practical benefits for robust, subpopulation-aware risk prediction. The approach is compatible with existing deep-learning survival pipelines and paves the way for multimodal extensions and finer-grained hazard trajectories in clinical settings.

Abstract

Survival analysis is a task to model the time until an event of interest occurs, widely used in clinical and biomedical research. A key challenge is to model patient heterogeneity while also adapting risk predictions to both individual characteristics and temporal dynamics. We propose a dual mixture-of-experts (MoE) framework for discrete-time survival analysis. Our approach combines a feature-encoder MoE for subgroup-aware representation learning with a hazard MoE that leverages patient features and time embeddings to capture temporal dynamics. This dual-MoE design flexibly integrates with existing deep learning based survival pipelines. On METABRIC and GBSG breast cancer datasets, our method consistently improves performance, boosting the time-dependent C-index up to 0.04 on the test sets, and yields further gains when incorporated into the Consurv framework.

Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis

TL;DR

The paper tackles non-proportional hazards and patient heterogeneity in discrete-time survival analysis by introducing a dual mixture-of-experts framework that jointly learns a Mixture of Feature Encoders for subgroup-aware representations and a time-conditioned Mixture of Hazard Networks for flexible, time-varying risk. The model maximizes the discrete-time likelihood while enforcing balanced expert usage through load-balancing regularizers on both encoder and hazard communities. Empirical results on METABRIC and GBSG breast cancer datasets show consistent improvements in both overall and time-dependent C-index, with additional gains when integrated into the ConSurv framework, highlighting practical benefits for robust, subpopulation-aware risk prediction. The approach is compatible with existing deep-learning survival pipelines and paves the way for multimodal extensions and finer-grained hazard trajectories in clinical settings.

Abstract

Survival analysis is a task to model the time until an event of interest occurs, widely used in clinical and biomedical research. A key challenge is to model patient heterogeneity while also adapting risk predictions to both individual characteristics and temporal dynamics. We propose a dual mixture-of-experts (MoE) framework for discrete-time survival analysis. Our approach combines a feature-encoder MoE for subgroup-aware representation learning with a hazard MoE that leverages patient features and time embeddings to capture temporal dynamics. This dual-MoE design flexibly integrates with existing deep learning based survival pipelines. On METABRIC and GBSG breast cancer datasets, our method consistently improves performance, boosting the time-dependent C-index up to 0.04 on the test sets, and yields further gains when incorporated into the Consurv framework.

Paper Structure

This paper contains 11 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Figure 1. Overall architecture of the proposed framework. Unlike prior survival models that use a single encoder and a single hazard head, our framework employs dual mixtures of experts: one over feature encoders and another over hazard networks, where hazard experts are shared across time bins and dynamically routed by patient and time embeddings.
  • Figure 2: Figure 2. Average routing probabilities of feature-encoder experts across ER and HER2 subgroups.
  • Figure 3: Table 3. Ablation on each MoE architecture.
  • Figure 4: Figure 4. Ablation on input of hazard router.