SoftTiger: A Clinical Foundation Model for Healthcare Workflows
Ye Chen, Igor Couto, Wei Cai, Cong Fu, Bruno Dorneles
TL;DR
SoftTiger introduces clinical foundation models for structuring unstructured notes into interoperable IPS/FHIR data, addressing long context and workflow integration in healthcare. It fine-tunes open-source bases (Llama-2, TigerBot) into 13B and 70B parameter models using 134M tokens and a data mix including MIMIC-IV notes and Asclepius tasks. In evaluations, SoftTiger outperforms popular open-source LLMs and GPT-3.5 and competes with Gemini-pro, highlighting potential to support clinical workflows and digital health democratization. The work releases models, data, and evaluation tools to accelerate adoption, while acknowledging limitations like hallucination and proposing future retrieval-augmented and RL-based improvements.
Abstract
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.
