The Joint Entity-Relation Extraction Model Based on Span and Interactive Fusion Representation for Chinese Medical Texts with Complex Semantics
Danni Feng, Runzhi Li, Jing Wang, Siyu Yan, Lihong Ma, Yunli Xing
TL;DR
This paper tackles joint entity–relation extraction in Chinese medical texts, where complex semantics pose challenges for standard pipelines. It introduces ISER, a span-based joint extraction framework that combines a Semantic Enhancement Attention module with an interactive fusion representation built on Cross Attention and BiLSTM to enable bidirectional information exchange between entity recognition and relation extraction. The model leverages span-based features and local/global contextual cues to robustly identify entities and their relations, and it is evaluated on the CH-DDI dataset and the CoNLL04 benchmark, where it achieves state-of-the-art or competitive performance. The CH-DDI dataset further demonstrates the approach's applicability to domain-specific Chinese medical text. Together, these contributions advance high-precision knowledge extraction for medical knowledge graphs and related downstream tasks in Chinese NLP.
Abstract
Joint entity-relation extraction is a critical task in transforming unstructured or semi-structured text into triplets, facilitating the construction of large-scale knowledge graphs, and supporting various downstream applications. Despite its importance, research on Chinese text, particularly with complex semantics in specialized domains like medicine, remains limited. To address this gap, we introduce the CH-DDI, a Chinese drug-drug interactions dataset designed to capture the intricacies of medical text. Leveraging the strengths of attention mechanisms in capturing long-range dependencies, we propose the SEA module, which enhances the extraction of complex contextual semantic information, thereby improving entity recognition and relation extraction. Additionally, to address the inefficiencies of existing methods in facilitating information exchange between entity recognition and relation extraction, we present an interactive fusion representation module. This module employs Cross Attention for bidirectional information exchange between the tasks and further refines feature extraction through BiLSTM. Experimental results on both our CH-DDI dataset and public CoNLL04 dataset demonstrate that our model exhibits strong generalization capabilities. On the CH-DDI dataset, our model achieves an F1-score of 96.73% for entity recognition and 78.43% for relation extraction. On the CoNLL04 dataset, it attains an entity recognition precision of 89.54% and a relation extraction accuracy of 71.64%.
