Table of Contents
Fetching ...

AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

Fergus Imrie, Bogdan Cebere, Eoin F. McKinney, Mihaela van der Schaar

TL;DR

AutoPrognosis 2.0 targets key barriers to clinical adoption of predictive modeling by automating ML pipeline construction for classification, regression, and time-to-event tasks, while embedding interpretability tools and web-based demonstrators. The framework automates missing-data imputation, feature processing, model selection, and hyperparameter tuning, and it produces explainable outputs using SHAP, exemplar-based reasoning, and symbolic risk equations, all within an open-source package. The paper demonstrates the approach with a UK Biobank diabetes risk study in a cohort of $n=502{,}467$, showing superior discriminative performance (C-index around $0.888$) and net-benefit over existing risk scores and Cox models, plus a web demonstrator for clinical use. By enabling non-ML experts to develop and share robust, personalized diagnostics and prognostics, AutoPrognosis 2.0 aims to democratize ML in healthcare and beyond.

Abstract

Diagnostic and prognostic models are increasingly important in medicine and inform many clinical decisions. Recently, machine learning approaches have shown improvement over conventional modeling techniques by better capturing complex interactions between patient covariates in a data-driven manner. However, the use of machine learning introduces a number of technical and practical challenges that have thus far restricted widespread adoption of such techniques in clinical settings. To address these challenges and empower healthcare professionals, we present a machine learning framework, AutoPrognosis 2.0, to develop diagnostic and prognostic models. AutoPrognosis leverages state-of-the-art advances in automated machine learning to develop optimized machine learning pipelines, incorporates model explainability tools, and enables deployment of clinical demonstrators, without requiring significant technical expertise. Our framework eliminates the major technical obstacles to predictive modeling with machine learning that currently impede clinical adoption. To demonstrate AutoPrognosis 2.0, we provide an illustrative application where we construct a prognostic risk score for diabetes using the UK Biobank, a prospective study of 502,467 individuals. The models produced by our automated framework achieve greater discrimination for diabetes than expert clinical risk scores. Our risk score has been implemented as a web-based decision support tool and can be publicly accessed by patients and clinicians worldwide. In addition, AutoPrognosis 2.0 is provided as an open-source python package. By open-sourcing our framework as a tool for the community, clinicians and other medical practitioners will be able to readily develop new risk scores, personalized diagnostics, and prognostics using modern machine learning techniques.

AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

TL;DR

AutoPrognosis 2.0 targets key barriers to clinical adoption of predictive modeling by automating ML pipeline construction for classification, regression, and time-to-event tasks, while embedding interpretability tools and web-based demonstrators. The framework automates missing-data imputation, feature processing, model selection, and hyperparameter tuning, and it produces explainable outputs using SHAP, exemplar-based reasoning, and symbolic risk equations, all within an open-source package. The paper demonstrates the approach with a UK Biobank diabetes risk study in a cohort of , showing superior discriminative performance (C-index around ) and net-benefit over existing risk scores and Cox models, plus a web demonstrator for clinical use. By enabling non-ML experts to develop and share robust, personalized diagnostics and prognostics, AutoPrognosis 2.0 aims to democratize ML in healthcare and beyond.

Abstract

Diagnostic and prognostic models are increasingly important in medicine and inform many clinical decisions. Recently, machine learning approaches have shown improvement over conventional modeling techniques by better capturing complex interactions between patient covariates in a data-driven manner. However, the use of machine learning introduces a number of technical and practical challenges that have thus far restricted widespread adoption of such techniques in clinical settings. To address these challenges and empower healthcare professionals, we present a machine learning framework, AutoPrognosis 2.0, to develop diagnostic and prognostic models. AutoPrognosis leverages state-of-the-art advances in automated machine learning to develop optimized machine learning pipelines, incorporates model explainability tools, and enables deployment of clinical demonstrators, without requiring significant technical expertise. Our framework eliminates the major technical obstacles to predictive modeling with machine learning that currently impede clinical adoption. To demonstrate AutoPrognosis 2.0, we provide an illustrative application where we construct a prognostic risk score for diabetes using the UK Biobank, a prospective study of 502,467 individuals. The models produced by our automated framework achieve greater discrimination for diabetes than expert clinical risk scores. Our risk score has been implemented as a web-based decision support tool and can be publicly accessed by patients and clinicians worldwide. In addition, AutoPrognosis 2.0 is provided as an open-source python package. By open-sourcing our framework as a tool for the community, clinicians and other medical practitioners will be able to readily develop new risk scores, personalized diagnostics, and prognostics using modern machine learning techniques.
Paper Structure (7 sections, 5 figures, 6 tables)

This paper contains 7 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the AutoPrognosis 2.0 framework. AutoPrognosis takes either raw or curated medical datasets and provides an imputed dataset, a report detailing the optimized machine learning pipelines, a diagnostic or prognostic model, explanations, and a web-based interface for clinicians to interact with and use the derived model.
  • Figure 2: Decision curve analysis. AutoPrognosis exhibits higher net benefit at all decision thresholds compared to existing risk scores and baseline treatment plans.
  • Figure 3: Value of information. We evaluate AutoPrognosis using different numbers of features, corresponding to different effect size thresholds. The feature efficiency is compared to QDiabetes Model C, the best performing existing risk score.
  • Figure 4: SHAP values for the most important features.
  • Figure 5: Screenshot of an example clinical demonstrator produced by AutoPrognosis.