Table of Contents
Fetching ...

Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction

Jung Hoon Lim, Sunjae Kwon, Zonghai Yao, John P. Lalor, Hong Yu

TL;DR

The paper investigates whether role-playing with ChatGPT can personalize medical term extraction to individuals' socio-demographic backgrounds to improve patient comprehension of EHR notes. It utilizes 270 Turkers to ground-truth term extraction on 20 sentences and compares SciSpacy, MedJEx, and ChatGPT across 14 demographic groups, with and without in-context learning. Key findings show role-playing boosts macro-F1 in the majority of cases, and GPT-4 with role-playing plus ICL achieves the best macro-F1 of 51.28, surpassing the previous state-of-the-art. The work demonstrates the potential of role-playing LLMs to tailor biomedical information and informs future research on patient-facing NLP tools such as EHR personalization and medical concept linking.

Abstract

Previous studies reveal that Electronic Health Records (EHR), which have been widely adopted in the U.S. to allow patients to access their personal medical information, do not have high readability to patients due to the prevalence of medical jargon. Tailoring medical notes to individual comprehension by identifying jargon that is difficult for each person will enhance the utility of generative models. We present the first quantitative analysis to measure the impact of role-playing in LLM in medical term extraction. By comparing the results of Mechanical Turk workers over 20 sentences, our study demonstrates that LLM role-playing improves F1 scores in 95% of cases across 14 different socio-demographic backgrounds. Furthermore, applying role-playing with in-context learning outperformed the previous state-of-the-art models. Our research showed that ChatGPT can improve traditional medical term extraction systems by utilizing role-play to deliver personalized patient education, a potential that previous models had not achieved.

Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction

TL;DR

The paper investigates whether role-playing with ChatGPT can personalize medical term extraction to individuals' socio-demographic backgrounds to improve patient comprehension of EHR notes. It utilizes 270 Turkers to ground-truth term extraction on 20 sentences and compares SciSpacy, MedJEx, and ChatGPT across 14 demographic groups, with and without in-context learning. Key findings show role-playing boosts macro-F1 in the majority of cases, and GPT-4 with role-playing plus ICL achieves the best macro-F1 of 51.28, surpassing the previous state-of-the-art. The work demonstrates the potential of role-playing LLMs to tailor biomedical information and informs future research on patient-facing NLP tools such as EHR personalization and medical concept linking.

Abstract

Previous studies reveal that Electronic Health Records (EHR), which have been widely adopted in the U.S. to allow patients to access their personal medical information, do not have high readability to patients due to the prevalence of medical jargon. Tailoring medical notes to individual comprehension by identifying jargon that is difficult for each person will enhance the utility of generative models. We present the first quantitative analysis to measure the impact of role-playing in LLM in medical term extraction. By comparing the results of Mechanical Turk workers over 20 sentences, our study demonstrates that LLM role-playing improves F1 scores in 95% of cases across 14 different socio-demographic backgrounds. Furthermore, applying role-playing with in-context learning outperformed the previous state-of-the-art models. Our research showed that ChatGPT can improve traditional medical term extraction systems by utilizing role-play to deliver personalized patient education, a potential that previous models had not achieved.
Paper Structure (9 sections, 3 equations, 3 figures, 3 tables)

This paper contains 9 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Message interactions with ChatGPT. A system message is sent to set the ChatGPT assistant's role. Then, a user message that contains the query is inputted such that ChatGPT could give the response in the requested format (Python list).
  • Figure 2: Macro F1 scores using gpt-3.5-turbo and gpt-4 with the temperature of 0.0.
  • Figure 3: Macro F1 scores with and without role-play for each socio-demographic factor with the temperature of 0.0.