Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition
Linh Le, Guido Zuccon, Gianluca Demartini, Genghong Zhao, Xia Zhang
TL;DR
This work tackles clinical named entity recognition by injecting domain-specific semantic type dependencies into neural models. It introduces short and long distance dependency representations derived from UMLS concepts and their semantic types, encoded as relation embeddings that connect tokens within a sentence. The approach is instantiated in BiLSTM-CRF and BiLSTM-GCN-CRF architectures and evaluated on ShARe/CLEF 2013 and i2b2/VA 2010 with embeddings including UmlsBERT, BioBERT, and BERT, showing improved precision and overall F1 in several settings. The study demonstrates that multi-type dependency information can reduce false positives and improve NER performance in clinical notes, offering a cost-effective alternative to multi-task learning and paving the way for richer domain knowledge integration in clinical NLP.
Abstract
Previous work on clinical relation extraction from free-text sentences leveraged information about semantic types from clinical knowledge bases as a part of entity representations. In this paper, we exploit additional evidence by also making use of domain-specific semantic type dependencies. We encode the relation between a span of tokens matching a Unified Medical Language System (UMLS) concept and other tokens in the sentence. We implement our method and compare against different named entity recognition (NER) architectures (i.e., BiLSTM-CRF and BiLSTM-GCN-CRF) using different pre-trained clinical embeddings (i.e., BERT, BioBERT, UMLSBert). Our experimental results on clinical datasets show that in some cases NER effectiveness can be significantly improved by making use of domain-specific semantic type dependencies. Our work is also the first study generating a matrix encoding to make use of more than three dependencies in one pass for the NER task.
