HealthcareNLP: where are we and what is next?
Lifeng Han, Paul Rayson, Suzan Verberne, Andrew Moore, Goran Nenadic
TL;DR
This paper presents a tutorial on HealthcareNLP, identifying gaps in existing reviews—namely synthetic data for privacy, explainable clinical NLP, and integrating retrieval-augmented generation with neural-symbolic LLM+KG approaches. It proposes a three-layer framework (data/resource, NLP-Eval, and patient-facing) and covers a broad range of tasks from NER and relation extraction to MT, text simplification, and SDM support, including governance and ethical considerations. The tutorial combines historical context with state-of-the-art methods and offers a hands-on session to demonstrate HealthcareNLP applications, targeting NLP practitioners, healthcare researchers, and students with no prior knowledge. By situating HealthcareNLP within the BioNLP lineage and current workshops, the work aims to consolidate methodologies, encourage responsible deployment, and foster cross-disciplinary collaboration for real-world impact in healthcare settings.
Abstract
This proposed tutorial focuses on Healthcare Domain Applications of NLP, what we have achieved around HealthcareNLP, and the challenges that lie ahead for the future. Existing reviews in this domain either overlook some important tasks, such as synthetic data generation for addressing privacy concerns, or explainable clinical NLP for improved integration and implementation, or fail to mention important methodologies, including retrieval augmented generation and the neural symbolic integration of LLMs and KGs. In light of this, the goal of this tutorial is to provide an introductory overview of the most important sub-areas of a patient- and resource-oriented HealthcareNLP, with three layers of hierarchy: data/resource layer: annotation guidelines, ethical approvals, governance, synthetic data; NLP-Eval layer: NLP tasks such as NER, RE, sentiment analysis, and linking/coding with categorised methods, leading to explainable HealthAI; patients layer: Patient Public Involvement and Engagement (PPIE), health literacy, translation, simplification, and summarisation (also NLP tasks), and shared decision-making support. A hands-on session will be included in the tutorial for the audience to use HealthcareNLP applications. The target audience includes NLP practitioners in the healthcare application domain, NLP researchers who are interested in domain applications, healthcare researchers, and students from NLP fields. The type of tutorial is "Introductory to CL/NLP topics (HealthcareNLP)" and the audience does not need prior knowledge to attend this. Tutorial materials: https://github.com/4dpicture/HealthNLP
