Table of Contents
Fetching ...

Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction

Yinan Yu, Falk Dippel, Christina E. Lundberg, Martin Lindgren, Annika Rosengren, Martin Adiels, Helen Sjöland

TL;DR

This work addresses the gap between predictive accuracy and downstream clinical value in heart failure mortality prediction by proposing the Cost-Aware Prediction (CAP) framework, which integrates an ML classifier with clinical impact projection (CIP) cost curves and a four-agent, large language model (LLM)–driven cost-benefit analysis to support decision-making. The method achieves a best-performing gradient-boosting model with AUROC $=0.804$ and AUPRC $=0.529$, while CIP curves reveal how different decision thresholds affect patient QoL and healthcare expenditures. The novel contribution lies in combining population-level cost visualization with patient-level, LLM-generated interpretations to elucidate trade-offs and improve interpretability and trust. The study demonstrates that CAP’s three-stage pipeline enables more transparent, cost-aware, and potentially policy-influencing decision support for home-care eligibility in heart failure, albeit with a need for more robust handling of speculative outputs from LLM agents.

Abstract

Objective: Machine learning (ML) predictive models are often developed without considering downstream value trade-offs and clinical interpretability. This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents to communicate the trade-offs involved in applying ML predictions. Materials and Methods: We developed an ML model predicting 1-year mortality in patients with heart failure (N = 30,021, 22% mortality) to identify those eligible for home care. We then introduced clinical impact projection (CIP) curves to visualize important cost dimensions - quality of life and healthcare provider expenses, further divided into treatment and error costs, to assess the clinical consequences of predictions. Finally, we used four LLM agents to generate patient-specific descriptions. The system was evaluated by clinicians for its decision support value. Results: The eXtreme gradient boosting (XGB) model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95% confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140). Discussion: The CIP cost curves provided a population-level overview of cost composition across decision thresholds, whereas LLM-generated cost-benefit analysis at individual patient-levels. The system was well received according to the evaluation by clinicians. However, feedback emphasizes the need to strengthen the technical accuracy for speculative tasks. Conclusion: CAP utilizes LLM agents to integrate ML classifier outcomes and cost-benefit analysis for more transparent and interpretable decision support.

Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction

TL;DR

This work addresses the gap between predictive accuracy and downstream clinical value in heart failure mortality prediction by proposing the Cost-Aware Prediction (CAP) framework, which integrates an ML classifier with clinical impact projection (CIP) cost curves and a four-agent, large language model (LLM)–driven cost-benefit analysis to support decision-making. The method achieves a best-performing gradient-boosting model with AUROC and AUPRC , while CIP curves reveal how different decision thresholds affect patient QoL and healthcare expenditures. The novel contribution lies in combining population-level cost visualization with patient-level, LLM-generated interpretations to elucidate trade-offs and improve interpretability and trust. The study demonstrates that CAP’s three-stage pipeline enables more transparent, cost-aware, and potentially policy-influencing decision support for home-care eligibility in heart failure, albeit with a need for more robust handling of speculative outputs from LLM agents.

Abstract

Objective: Machine learning (ML) predictive models are often developed without considering downstream value trade-offs and clinical interpretability. This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents to communicate the trade-offs involved in applying ML predictions. Materials and Methods: We developed an ML model predicting 1-year mortality in patients with heart failure (N = 30,021, 22% mortality) to identify those eligible for home care. We then introduced clinical impact projection (CIP) curves to visualize important cost dimensions - quality of life and healthcare provider expenses, further divided into treatment and error costs, to assess the clinical consequences of predictions. Finally, we used four LLM agents to generate patient-specific descriptions. The system was evaluated by clinicians for its decision support value. Results: The eXtreme gradient boosting (XGB) model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95% confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140). Discussion: The CIP cost curves provided a population-level overview of cost composition across decision thresholds, whereas LLM-generated cost-benefit analysis at individual patient-levels. The system was well received according to the evaluation by clinicians. However, feedback emphasizes the need to strengthen the technical accuracy for speculative tasks. Conclusion: CAP utilizes LLM agents to integrate ML classifier outcomes and cost-benefit analysis for more transparent and interpretable decision support.

Paper Structure

This paper contains 27 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Cohort flow chart indicating selection process
  • Figure 2: Visual comparison of discriminative and calibration performance of model candidates Receiver operating characteristic curve (A), precision-recall curve (B) and calibration curve (C) demonstrate highest predictive performance for gradient boosting machines based on the test set. AUPRC=auprc. AUROC=auroc. BS=bs. LGB=lgb. LR=lr. RF=rf. XGB=xgb.
  • Figure 3: cost curves visualises different cost contributions cost curves combine the cost curves for prediction error and treatment costs at varying decision thresholds taking different cost dimensions, namely patient's and healthcare system, into account. At the population-level, the stacked cost contributions highlight the clinical impact between potentially competing factors. At the patient-level, visualises the cost-benefit within a risk band (yellow shade) of the patient-specific risk prediction in relation to the decision threshold. CIP=cip. HC=health care. QoL=qol.