Table of Contents
Fetching ...

Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications

Mustafa Cavus

TL;DR

This paper addresses predictive multiplicity in survival models for predictive maintenance. It introduces a Rashomon-based framework and three metrics—ambiguity, discrepancy, obscurity—to quantify disagreement among near-optimal survival models. It applies the framework to CMAPSS data using Random Survival Forests, revealing substantial, dataset-dependent multiplicity that can affect individual risk estimates. The work emphasizes the importance of reporting multiplicity to support trustworthy, uncertainty-aware maintenance decisions and suggests broader applicability to high-stakes domains.

Abstract

In many applications, especially those involving prediction, models may yield near-optimal performance yet significantly disagree on individual-level outcomes. This phenomenon, known as predictive multiplicity, has been formally defined in binary, probabilistic, and multi-target classification, and undermines the reliability of predictive systems. However, its implications remain unexplored in the context of survival analysis, which involves estimating the time until a failure or similar event while properly handling censored data. We frame predictive multiplicity as a critical concern in survival-based models and introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it. This is particularly relevant for downstream tasks such as maintenance scheduling, where precise individual risk estimates are essential. Understanding and reporting predictive multiplicity helps build trust in models deployed in high-stakes environments. We apply our methodology to benchmark datasets from predictive maintenance, extending the notion of multiplicity to survival models. Our findings show that ambiguity steadily increases, reaching up to 40-45% of observations; discrepancy is lower but exhibits a similar trend; and obscurity remains mild and concentrated in a few models. These results demonstrate that multiple accurate survival models may yield conflicting estimations of failure risk and degradation progression for the same equipment. This highlights the need to explicitly measure and communicate predictive multiplicity to ensure reliable decision-making in process health management.

Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications

TL;DR

This paper addresses predictive multiplicity in survival models for predictive maintenance. It introduces a Rashomon-based framework and three metrics—ambiguity, discrepancy, obscurity—to quantify disagreement among near-optimal survival models. It applies the framework to CMAPSS data using Random Survival Forests, revealing substantial, dataset-dependent multiplicity that can affect individual risk estimates. The work emphasizes the importance of reporting multiplicity to support trustworthy, uncertainty-aware maintenance decisions and suggests broader applicability to high-stakes domains.

Abstract

In many applications, especially those involving prediction, models may yield near-optimal performance yet significantly disagree on individual-level outcomes. This phenomenon, known as predictive multiplicity, has been formally defined in binary, probabilistic, and multi-target classification, and undermines the reliability of predictive systems. However, its implications remain unexplored in the context of survival analysis, which involves estimating the time until a failure or similar event while properly handling censored data. We frame predictive multiplicity as a critical concern in survival-based models and introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it. This is particularly relevant for downstream tasks such as maintenance scheduling, where precise individual risk estimates are essential. Understanding and reporting predictive multiplicity helps build trust in models deployed in high-stakes environments. We apply our methodology to benchmark datasets from predictive maintenance, extending the notion of multiplicity to survival models. Our findings show that ambiguity steadily increases, reaching up to 40-45% of observations; discrepancy is lower but exhibits a similar trend; and obscurity remains mild and concentrated in a few models. These results demonstrate that multiple accurate survival models may yield conflicting estimations of failure risk and degradation progression for the same equipment. This highlights the need to explicitly measure and communicate predictive multiplicity to ensure reliable decision-making in process health management.

Paper Structure

This paper contains 18 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Survival Rashomon cube with size $m \times n$ means that it comprises $m$ models and $n$ observations. The ambiguity is the ratio of the models including the conflicting predictions, discrepancy is the maximum conflict ratio between the models, and obscurity shows the mean conflict ratio across the observations. A conflicting prediction is defined as a risk prediction of a model that deviates from the prediction of the reference model $f_R$ by a $\delta$ difference.
  • Figure 2: The values of predictive multiplicity metrics across various Rashomon parameters $\epsilon$ and conflict thresholds $\delta$ for the four subsets of the CMAPSS dataset. The color gradient represents the severity of multiplicity. $\delta$ sets the threshold for how much a model's prediction must differ from the reference to be considered conflicting.