Hierarchical biomarker thresholding: a model-agnostic framework for stability
O. Debeaupuis
TL;DR
The paper introduces a model-agnostic framework for stable hierarchical biomarker thresholding that yields an external-risk certificate at the realized operating point $\hat{t}$. It decomposes external risk $R_Q(\hat{t})$ into internal fit, patient-level generalization, operating-point shift, and instability, and links this decomposition to a bootstrap-based stability penalty for threshold selection. The approach enables quantile-scale ensembling and selection-honest evaluation with actionable diagnostics, while providing a monotone-invariant aggregation across methods and sites. Empirical validation on CAMELYON pathology data and MIMIC-IV-ECG demonstrates reduced external risk and fewer decision flips compared with baselines, illustrating practical deployment benefits and interpretability. The framework offers a principled, interpretable, and transport-aware methodology for threshold-based decisions in hierarchical, domain-shifted biomedical settings.
Abstract
Many biomarker pipelines require patient-level decisions aggregated from instance-level (cell/patch) scores. Thresholds tuned on pooled instances often fail across sites due to hierarchical dependence, prevalence shift, and score-scale mismatch. We present a selection-honest framework for hierarchical thresholding that makes patient-level decisions reproducible and more defensible. At its core is a risk decomposition theorem for selection-honest thresholds. The theorem separates contributions from (i) internal fit and patient-level generalization, (ii) operating-point shift reflecting prevalence and shape changes, and (iii) a stability term that penalizes sensitivity to threshold perturbations. The stability component is computable via patient-block bootstraps mapped through a monotone modulus of risk. This framework is model-agnostic, reconciles heterogeneous decision rules on a quantile scale, and yields monotone-invariant ensembles and reportable diagnostics (e.g. flip-rate, operating-point shift).
