Table of Contents
Fetching ...

Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations

Peichao Lai, Jiaxin Gan, Feiyang Ye, Yilei Wang, Bin Cui

TL;DR

The paper tackles low-resource domain sequence labeling, especially for Chinese, by uniting an LLM-driven knowledge enhancement workflow with a span-based KnowFREE model that supports nested and extension labels without external knowledge at inference. Central ideas include generating extension tags (entity, segmentation, POS) and enriched contextual explanations via prompts, and fusing these with synthetic data to train a robust span-based extractor. KnowFREE employs a Biaffine decoder and a local multi-head attention mechanism to integrate multi-label features, with a loss design that balances target and extension labels and with inference masking to predict only target labels. Experimental results across Chinese and English datasets in many-shot and few-shot settings show state-of-the-art performance, with notable gains from enriched explanation synthesis in low-resource regimes and demonstrated cross-lingual adaptability to Japanese and Korean. The work advances practical domain-specific sequence labeling by efficiently leveraging LLMs for knowledge augmentation while maintaining inference-time simplicity and robustness in low-resource contexts.

Abstract

Sequence labeling remains a significant challenge in low-resource, domain-specific scenarios, particularly for character-dense languages like Chinese. Existing methods primarily focus on enhancing model comprehension and improving data diversity to boost performance. However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. Our workflow employs explanation prompts to generate precise contextual interpretations of target entities, effectively mitigating semantic biases and enriching the model's contextual understanding. The KnowFREE model further integrates extension label features, enabling efficient nested entity extraction without relying on external knowledge during inference. Experiments on multiple Chinese domain-specific sequence labeling datasets demonstrate that our approach achieves state-of-the-art performance, effectively addressing the challenges posed by low-resource settings.

Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations

TL;DR

The paper tackles low-resource domain sequence labeling, especially for Chinese, by uniting an LLM-driven knowledge enhancement workflow with a span-based KnowFREE model that supports nested and extension labels without external knowledge at inference. Central ideas include generating extension tags (entity, segmentation, POS) and enriched contextual explanations via prompts, and fusing these with synthetic data to train a robust span-based extractor. KnowFREE employs a Biaffine decoder and a local multi-head attention mechanism to integrate multi-label features, with a loss design that balances target and extension labels and with inference masking to predict only target labels. Experimental results across Chinese and English datasets in many-shot and few-shot settings show state-of-the-art performance, with notable gains from enriched explanation synthesis in low-resource regimes and demonstrated cross-lingual adaptability to Japanese and Korean. The work advances practical domain-specific sequence labeling by efficiently leveraging LLMs for knowledge augmentation while maintaining inference-time simplicity and robustness in low-resource contexts.

Abstract

Sequence labeling remains a significant challenge in low-resource, domain-specific scenarios, particularly for character-dense languages like Chinese. Existing methods primarily focus on enhancing model comprehension and improving data diversity to boost performance. However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. Our workflow employs explanation prompts to generate precise contextual interpretations of target entities, effectively mitigating semantic biases and enriching the model's contextual understanding. The KnowFREE model further integrates extension label features, enabling efficient nested entity extraction without relying on external knowledge during inference. Experiments on multiple Chinese domain-specific sequence labeling datasets demonstrate that our approach achieves state-of-the-art performance, effectively addressing the challenges posed by low-resource settings.

Paper Structure

This paper contains 19 sections, 13 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Distinctions between our method and existing methods in terms of model-centric and data-centric.
  • Figure 2: The workflow (top) and detailed pipeline structure (bottom) of our knowledge enhancement framework. Pipeline 1 generates extension entities to enhance the performance of KnowFREE, while Pipeline 2 synthesizes additional training samples and entities. We then use a frozen KnowFREE model to annotate target entities within these synthetic samples.
  • Figure 3: The architecture of the KnowFREE model. The span logits corresponding to the extension entity labels are ignored during inference. Matrices like (a), (b), (c), and (d) represent the span logits for each label type.
  • Figure 4: Performance comparison with and without enriched explanation synthesis under k-shot sampling.
  • Figure 5: t-SNE visualization of the training, test and enriched explanation samples under different sampling sizes. The synthetic enriched explanation samples are generated by ChatGLM3-6B, and they are represented by the "Synthetic" in the legend.
  • ...and 7 more figures