Table of Contents
Fetching ...

Stabilizing Machine Learning for Reproducible and Explainable Results: A Novel Validation Approach to Subject-Specific Insights

Gideon Vos, Liza van Eijk, Zoltan Sarnyai, Mostafa Rahimi Azghadi

TL;DR

Reproducibility and explainability in biomedical ML are undermined by sensitivity to random seeds and validation choices. The authors introduce a randomized-trials validation framework that leverages a single general Random Forest model and up to 400 trials per subject to stabilize both model performance and feature importance at subject- and group-levels. Across nine open datasets, the method yields stable feature importance that aligns with established clinical biomarkers while maintaining competitive predictive accuracy, and it reduces seed- and validation-induced variability. The approach thus offers a practical, explainable alternative to costly subject-specific models, with open-source code to enable replication and broader adoption in biomarker discovery and personalized medicine.

Abstract

Machine Learning is transforming medical research by improving diagnostic accuracy and personalizing treatments. General ML models trained on large datasets identify broad patterns across populations, but their effectiveness is often limited by the diversity of human biology. This has led to interest in subject-specific models that use individual data for more precise predictions. However, these models are costly and challenging to develop. To address this, we propose a novel validation approach that uses a general ML model to ensure reproducible performance and robust feature importance analysis at both group and subject-specific levels. We tested a single Random Forest (RF) model on nine datasets varying in domain, sample size, and demographics. Different validation techniques were applied to evaluate accuracy and feature importance consistency. To introduce variability, we performed up to 400 trials per subject, randomly seeding the ML algorithm for each trial. This generated 400 feature sets per subject, from which we identified top subject-specific features. A group-specific feature importance set was then derived from all subject-specific results. We compared our approach to conventional validation methods in terms of performance and feature importance consistency. Our repeated trials approach, with random seed variation, consistently identified key features at the subject level and improved group-level feature importance analysis using a single general model. Subject-specific models address biological variability but are resource-intensive. Our novel validation technique provides consistent feature importance and improved accuracy within a general ML model, offering a practical and explainable alternative for clinical research.

Stabilizing Machine Learning for Reproducible and Explainable Results: A Novel Validation Approach to Subject-Specific Insights

TL;DR

Reproducibility and explainability in biomedical ML are undermined by sensitivity to random seeds and validation choices. The authors introduce a randomized-trials validation framework that leverages a single general Random Forest model and up to 400 trials per subject to stabilize both model performance and feature importance at subject- and group-levels. Across nine open datasets, the method yields stable feature importance that aligns with established clinical biomarkers while maintaining competitive predictive accuracy, and it reduces seed- and validation-induced variability. The approach thus offers a practical, explainable alternative to costly subject-specific models, with open-source code to enable replication and broader adoption in biomarker discovery and personalized medicine.

Abstract

Machine Learning is transforming medical research by improving diagnostic accuracy and personalizing treatments. General ML models trained on large datasets identify broad patterns across populations, but their effectiveness is often limited by the diversity of human biology. This has led to interest in subject-specific models that use individual data for more precise predictions. However, these models are costly and challenging to develop. To address this, we propose a novel validation approach that uses a general ML model to ensure reproducible performance and robust feature importance analysis at both group and subject-specific levels. We tested a single Random Forest (RF) model on nine datasets varying in domain, sample size, and demographics. Different validation techniques were applied to evaluate accuracy and feature importance consistency. To introduce variability, we performed up to 400 trials per subject, randomly seeding the ML algorithm for each trial. This generated 400 feature sets per subject, from which we identified top subject-specific features. A group-specific feature importance set was then derived from all subject-specific results. We compared our approach to conventional validation methods in terms of performance and feature importance consistency. Our repeated trials approach, with random seed variation, consistently identified key features at the subject level and improved group-level feature importance analysis using a single general model. Subject-specific models address biological variability but are resource-intensive. Our novel validation technique provides consistent feature importance and improved accuracy within a general ML model, offering a practical and explainable alternative for clinical research.

Paper Structure

This paper contains 11 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Proposed randomized trial validation approach for subject- and group-specific feature importance and model performance stabilization.
  • Figure 2: Original published results from Chekroud2024 vs. reproduced results with a different random seed using the published source code Chekroudcode.
  • Figure 3: Experimental results on the Breast Cancer dataset datasetcancer. The figure shows how modifying the cross-validation technique and/or random seed can result in different feature importance sets, undermining model generalization, stability, and explainability. The figure also shows a stabilized feature importance set, using our proposed random trial validation technique.
  • Figure 4: Experimental results on the Diabetes dataset datasetdiabetes. The figure shows how modifying the cross-validation technique and/or random seed can result in different feature importance sets, undermining model generalization, stability, and explainability. The figure additionally presents a stabilized feature importance set for the entire dataset subjects (third row, left column) and for a sample individual subject 1 (right column), employing our proposed validation method of random trials.
  • Figure 5: Experimental results on the Alzheimer's disease dataset datasetalzheimers. The figure shows how modifying the cross-validation technique even when the random seed is kept the same results in different feature importance sets, undermining model generalization, stability, and explainability.
  • ...and 2 more figures