Table of Contents
Fetching ...

Cross-domain Named Entity Recognition via Graph Matching

Junhao Zheng, Haibin Chen, Qianli Ma

TL;DR

This work tackles cross-domain NER under data scarcity by introducing Label Structure Transfer for cross-domain NER (LST-NER), which builds label graphs for both source and target label spaces from model predictions and aligns them with a Gromov-Wasserstein distance-based graph matching. It enhances contextual representations by fusing label-graph semantics into BERT embeddings through a label-guided attention mechanism and GCN, coupled with an auxiliary multi-label task. The method yields robust improvements over transfer-learning, multi-task, and few-shot baselines across eight domains in both rich- and low-resource regimes, and benefits further when combined with domain-adaptive pre-training. These results demonstrate that explicit modeling and transfer of label structure can effectively bridge domain gaps and may generalize to other cross-domain prediction tasks.

Abstract

Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods.

Cross-domain Named Entity Recognition via Graph Matching

TL;DR

This work tackles cross-domain NER under data scarcity by introducing Label Structure Transfer for cross-domain NER (LST-NER), which builds label graphs for both source and target label spaces from model predictions and aligns them with a Gromov-Wasserstein distance-based graph matching. It enhances contextual representations by fusing label-graph semantics into BERT embeddings through a label-guided attention mechanism and GCN, coupled with an auxiliary multi-label task. The method yields robust improvements over transfer-learning, multi-task, and few-shot baselines across eight domains in both rich- and low-resource regimes, and benefits further when combined with domain-adaptive pre-training. These results demonstrate that explicit modeling and transfer of label structure can effectively bridge domain gaps and may generalize to other cross-domain prediction tasks.

Abstract

Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods.
Paper Structure (13 sections, 12 equations, 6 figures, 4 tables)

This paper contains 13 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: A demonstration of graph matching. In both two cases, our model learns graph structures from the source label space and makes correct predictions. In two label spaces, each node is a target label and the matching nodes and edges are opaque.
  • Figure 2: A demonstration of the proposed model. First, the label graph from source label space is incorporated into the contextual representation by GCN. Then, the target model transfers graph structures from the source model via graph matching. Finally, the target model makes correct predictions with the learned label structures.
  • Figure 3: Comparisons when utilizing different amounts of data for training in "Restaurant Reviews" domain.
  • Figure 4: The impact of temperature $T$ and edge threshold $\delta$ to the performance in "Restaurant Reviews" domain.
  • Figure 5: The impact of weight parameters $\lambda_1$ and $\lambda_2$ to the performance in "Restaurant Reviews" domain.
  • ...and 1 more figures