Methods for Recognizing Nested Terms
Igor Rozhkov, Natalia Loukachevitch
TL;DR
The paper tackles recognizing nested terms and transferring learning from flat to nested annotations. It employs the Binder model with ruRoberta-large to cast nested term extraction as Nested Named Entity Recognition, achieving state-of-the-art results on all RuTermEval tracks. It also introduces and evaluates nested term recognition from flat supervision via pseudo-labeling techniques (notably inclusions and damaged cross-prediction), showing that nested terms can be effectively recovered without nested labeling. These findings advance domain-specific term extraction and enable better transfer to new domains with reduced labeling burden.
Abstract
In this paper, we describe our participation in the RuTermEval competition devoted to extracting nested terms. We apply the Binder model, which was previously successfully applied to the recognition of nested named entities, to extract nested terms. We obtained the best results of term recognition in all three tracks of the RuTermEval competition. In addition, we study the new task of recognition of nested terms from flat training data annotated with terms without nestedness. We can conclude that several approaches we proposed in this work are viable enough to retrieve nested terms effectively without nested labeling of them.
