Table of Contents
Fetching ...

A Study of Generative Large Language Model for Medical Research and Healthcare

Cheng Peng, Xi Yang, Aokun Chen, Kaleb E Smith, Nima PourNejatian, Anthony B Costa, Cheryl Martin, Mona G Flores, Ying Zhang, Tanja Magoc, Gloria Lipori, Duane A Mitchell, Naykky S Ospina, Mustafa M Ahmed, William R Hogan, Elizabeth A Shenkman, Yi Guo, Jiang Bian, Yonghui Wu

TL;DR

Insight into the opportunities and challenges of LLMs for medical research and healthcare is provided and synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text.

Abstract

There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical natural language processing for medical research. Synthetic NLP models trained using GatorTronGPT generated text outperform NLP models trained using real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best) scale shows that there is no significant difference in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights on the opportunities and challenges of LLMs for medical research and healthcare.

A Study of Generative Large Language Model for Medical Research and Healthcare

TL;DR

Insight into the opportunities and challenges of LLMs for medical research and healthcare is provided and synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text.

Abstract

There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical natural language processing for medical research. Synthetic NLP models trained using GatorTronGPT generated text outperform NLP models trained using real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best) scale shows that there is no significant difference in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights on the opportunities and challenges of LLMs for medical research and healthcare.
Paper Structure (23 sections, 1 equation, 2 figures, 6 tables)

This paper contains 23 sections, 1 equation, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Develop a clinical generative large language model, GatorTronGPT, for biomedical natural language processing, clinical text generation, and healthcare text evaluation. a, Train GatorTronGPT from scratch using GPT-3 architecture with up to 20 billion parameters. b, Solve biomedical relation extraction and question answering using a unified P-tuning base text generation architecture. c, Apply GatorTronGPT to generate 20 billion words of synthetic clinical text, which was used to train synthetic natural language processing model, GatorTronS. d, Turing evaluation of 30 paragraphs of text written by GatorTronGPT mixed with 30 real-world paragraphs written by UF Health physicians. TrM: transformer unit; B: billion
  • Figure 2: Training loss and validation loss for GatorTronGPT 5 billion and 20 billion models.