Table of Contents
Fetching ...

Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19

Mohammad Amin Roshani, Xiangyu Zhou, Yao Qiang, Srinivasan Suresh, Steve Hicks, Usha Sethuraman, Dongxiao Zhu

TL;DR

This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches.

Abstract

Large language models (LLMs) have shown remarkable capabilities in various natural language tasks and are increasingly being applied in healthcare domains. This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches. In a COVID-19 severity risk assessment case study, we fine-tune pre-trained generative LLMs (e.g., Llama2-7b and Flan-t5-xl) using a few shots of natural language examples, comparing their performance with traditional classifiers (i.e., Logistic Regression, XGBoost, Random Forest) that are trained de novo using tabular data across various experimental settings. We develop a mobile application that uses these fine-tuned LLMs as its generative AI (GenAI) core to facilitate real-time interaction between clinicians and patients, providing no-code risk assessment through conversational interfaces. This integration not only allows for the use of streaming Questions and Answers (QA) as inputs but also offers personalized feature importance analysis derived from the LLM's attention layers, enhancing the interpretability of risk assessments. By achieving high Area Under the Curve (AUC) scores with a limited number of fine-tuning samples, our results demonstrate the potential of generative LLMs to outperform discriminative classification methods in low-data regimes, highlighting their real-world adaptability and effectiveness. This work aims to fill the existing gap in leveraging generative LLMs for interactive no-code risk assessment and to encourage further research in this emerging field.

Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19

TL;DR

This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches.

Abstract

Large language models (LLMs) have shown remarkable capabilities in various natural language tasks and are increasingly being applied in healthcare domains. This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches. In a COVID-19 severity risk assessment case study, we fine-tune pre-trained generative LLMs (e.g., Llama2-7b and Flan-t5-xl) using a few shots of natural language examples, comparing their performance with traditional classifiers (i.e., Logistic Regression, XGBoost, Random Forest) that are trained de novo using tabular data across various experimental settings. We develop a mobile application that uses these fine-tuned LLMs as its generative AI (GenAI) core to facilitate real-time interaction between clinicians and patients, providing no-code risk assessment through conversational interfaces. This integration not only allows for the use of streaming Questions and Answers (QA) as inputs but also offers personalized feature importance analysis derived from the LLM's attention layers, enhancing the interpretability of risk assessments. By achieving high Area Under the Curve (AUC) scores with a limited number of fine-tuning samples, our results demonstrate the potential of generative LLMs to outperform discriminative classification methods in low-data regimes, highlighting their real-world adaptability and effectiveness. This work aims to fill the existing gap in leveraging generative LLMs for interactive no-code risk assessment and to encourage further research in this emerging field.
Paper Structure (20 sections, 2 equations, 5 figures, 1 table)

This paper contains 20 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: A comparison between LLM-based conversational AI (Conv-AI) and traditional machine learning methods for disease risk assessment. The Conv-AI leverages pretrained models that require only very few-shot fine-tuning, can handle unstructured textual data, provide real-time feature importance for each risk assessment it provides, and offer transferability with zero to very few-shots for new risk assessment tasks. In contrast, traditional machine learning methods require large datasets for de novo training, process structured data, rely on extra computational steps for instance-specific post-hoc feature importance (e.g., SHAP), and need retraining for each new task.
  • Figure 2: Workflow for few-shot COVID-19 severity risk assessment using generative LLMs with different serialization techniques. The top section, labeled Backend - System Developer, shows the fine-tuning phase where a few-shot sample of patient data, serialized via List and Text Templates, is used to fine-tune the LLMs. This backend process includes the creation of prompts and corresponding labels for model fine-tuning. The bottom section, labeled Frontend - User, illustrates how a conversational chatbot interacts with users through our application to gather responses via streaming QAs. These responses are analyzed by the fine-tuned LLM in real-time, providing risk assessments and highlighting the top attributing features that explain the model’s risk assessment.
  • Figure 3: Normalized attention scores from LLaMA2-7b in the 32-shot setting, showing feature importance for two test cases, one positive (yes) and one negative (no), simultaneously with the risk assessment.
  • Figure 4: Overview of our mobile application design, showcasing patient data collection, real-time risk assessment using LLMs, and clinician review interface.
  • Figure 5: Average AUC in 2-shot setting over five different seeds. The left panel shows results using the List Serialization (-L) approach, while the right panel shows results using the Text Serialization (-T) approach.