Table of Contents
Fetching ...

A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models

Ferit Akaybicen, Aaron Cummings, Lota Iwuagwu, Xinyue Zhang, Modupe Adewuyi

TL;DR

This study tackles rapid emergency detection from text in telemedicine using large language models and prompt engineering. By evaluating LLaMA variants 1B, 3B, and 7B with system prompts and in-prompt training on an on-premises setup, the authors achieve 99.7% accuracy with the 7B model and 99.6% with the 3B model, identifying 10 in-prompt examples as optimum. The approach emphasizes safety-critical performance by minimizing false negatives, while also detailing processing speeds across hardware and ensuring privacy per HIPAA in a no-data-retention framework. The findings show that moderate-sized LLMs, when guided by carefully crafted prompts, can deliver high-accuracy emergency classification suitable for telemedicine and remote triage, with practical implications for deployment and future enhancements such as multilingual capabilities and system integration.

Abstract

The rapid identification of medical emergencies through digital communication channels remains a critical challenge in modern healthcare delivery, particularly with the increasing prevalence of telemedicine. This paper presents a novel approach leveraging large language models (LLMs) and prompt engineering techniques for automated emergency detection in medical communications. We developed and evaluated a comprehensive system using multiple LLaMA model variants (1B, 3B, and 7B parameters) to classify medical scenarios as emergency or non-emergency situations. Our methodology incorporated both system prompts and in-prompt training approaches, evaluated across different hardware configurations. The results demonstrate exceptional performance, with the LLaMA 2 (7B) model achieving 99.7% accuracy and the LLaMA 3.2 (3B) model reaching 99.6% accuracy with optimal prompt engineering. Through systematic testing of training examples within the prompts, we identified that including 10 example scenarios in the model prompts yielded optimal classification performance. Processing speeds varied significantly between platforms, ranging from 0.05 to 2.2 seconds per request. The system showed particular strength in minimizing high-risk false negatives in emergency scenarios, which is crucial for patient safety. The code implementation and evaluation framework are publicly available on GitHub, facilitating further research and development in this crucial area of healthcare technology.

A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models

TL;DR

This study tackles rapid emergency detection from text in telemedicine using large language models and prompt engineering. By evaluating LLaMA variants 1B, 3B, and 7B with system prompts and in-prompt training on an on-premises setup, the authors achieve 99.7% accuracy with the 7B model and 99.6% with the 3B model, identifying 10 in-prompt examples as optimum. The approach emphasizes safety-critical performance by minimizing false negatives, while also detailing processing speeds across hardware and ensuring privacy per HIPAA in a no-data-retention framework. The findings show that moderate-sized LLMs, when guided by carefully crafted prompts, can deliver high-accuracy emergency classification suitable for telemedicine and remote triage, with practical implications for deployment and future enhancements such as multilingual capabilities and system integration.

Abstract

The rapid identification of medical emergencies through digital communication channels remains a critical challenge in modern healthcare delivery, particularly with the increasing prevalence of telemedicine. This paper presents a novel approach leveraging large language models (LLMs) and prompt engineering techniques for automated emergency detection in medical communications. We developed and evaluated a comprehensive system using multiple LLaMA model variants (1B, 3B, and 7B parameters) to classify medical scenarios as emergency or non-emergency situations. Our methodology incorporated both system prompts and in-prompt training approaches, evaluated across different hardware configurations. The results demonstrate exceptional performance, with the LLaMA 2 (7B) model achieving 99.7% accuracy and the LLaMA 3.2 (3B) model reaching 99.6% accuracy with optimal prompt engineering. Through systematic testing of training examples within the prompts, we identified that including 10 example scenarios in the model prompts yielded optimal classification performance. Processing speeds varied significantly between platforms, ranging from 0.05 to 2.2 seconds per request. The system showed particular strength in minimizing high-risk false negatives in emergency scenarios, which is crucial for patient safety. The code implementation and evaluation framework are publicly available on GitHub, facilitating further research and development in this crucial area of healthcare technology.

Paper Structure

This paper contains 18 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: System Architecture Overview
  • Figure 2: Performance Comparison Across LLaMA Model Variants
  • Figure 3: Impact of Tuning Messages on Model Accuracy