Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations
Peichao Lai, Jiaxin Gan, Feiyang Ye, Yilei Wang, Bin Cui
TL;DR
The paper tackles low-resource domain sequence labeling, especially for Chinese, by uniting an LLM-driven knowledge enhancement workflow with a span-based KnowFREE model that supports nested and extension labels without external knowledge at inference. Central ideas include generating extension tags (entity, segmentation, POS) and enriched contextual explanations via prompts, and fusing these with synthetic data to train a robust span-based extractor. KnowFREE employs a Biaffine decoder and a local multi-head attention mechanism to integrate multi-label features, with a loss design that balances target and extension labels and with inference masking to predict only target labels. Experimental results across Chinese and English datasets in many-shot and few-shot settings show state-of-the-art performance, with notable gains from enriched explanation synthesis in low-resource regimes and demonstrated cross-lingual adaptability to Japanese and Korean. The work advances practical domain-specific sequence labeling by efficiently leveraging LLMs for knowledge augmentation while maintaining inference-time simplicity and robustness in low-resource contexts.
Abstract
Sequence labeling remains a significant challenge in low-resource, domain-specific scenarios, particularly for character-dense languages like Chinese. Existing methods primarily focus on enhancing model comprehension and improving data diversity to boost performance. However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. Our workflow employs explanation prompts to generate precise contextual interpretations of target entities, effectively mitigating semantic biases and enriching the model's contextual understanding. The KnowFREE model further integrates extension label features, enabling efficient nested entity extraction without relying on external knowledge during inference. Experiments on multiple Chinese domain-specific sequence labeling datasets demonstrate that our approach achieves state-of-the-art performance, effectively addressing the challenges posed by low-resource settings.
