Table of Contents
Fetching ...

Small Language Models as Effective Guides for Large Language Models in Chinese Relation Extraction

Xuemei Tang, Jun Wang

TL;DR

This work tackles the long-tail problem in Chinese relation extraction by coupling small pre-trained language models (SLMs) with large language models (LLMs) through a Train-Guide-Predict workflow called SLCoLM. An SLM is trained to learn task-specific knowledge and produce initial predictions that guide the LLM via demonstrations in carefully constructed prompts; the LLM then leverages chain-of-thought reasoning and domain knowledge to refine predictions, which are merged with the SLM outputs. The approach introduces candidate relation type selection to keep prompts compact and a four-mode merge strategy to fuse SLM and LLM predictions, with Mode 3 often yielding the strongest gains especially for tail relations. Experiments on the ancient-Chinese ChisRE dataset show that SLCoLM provides substantial improvements in zero-shot and ICL settings over pure LLM or SLM baselines, highlighting effective cross-model knowledge transfer and enhanced handling of long-tail relations in a low-resource domain.

Abstract

Recently, large language models (LLMs) have been successful in relational extraction (RE) tasks, especially in the few-shot learning. An important problem in the field of RE is long-tailed data, while not much attention is paid to this problem using LLM approaches. Therefore, in this paper, we propose SLCoLM, a model collaboration framework, to mitigate the data long-tail problem. In our framework, we use the ``\textit{Training-Guide-Predict}'' strategy to combine the strengths of small pre-trained language models (SLMs) and LLMs, where a task-specific SLM framework acts as a guider, transfers task knowledge to the LLM and guides the LLM in performing RE tasks. Our experiments on an ancient Chinese RE dataset rich in relation types show that the approach facilitates RE of long-tail relation types.

Small Language Models as Effective Guides for Large Language Models in Chinese Relation Extraction

TL;DR

This work tackles the long-tail problem in Chinese relation extraction by coupling small pre-trained language models (SLMs) with large language models (LLMs) through a Train-Guide-Predict workflow called SLCoLM. An SLM is trained to learn task-specific knowledge and produce initial predictions that guide the LLM via demonstrations in carefully constructed prompts; the LLM then leverages chain-of-thought reasoning and domain knowledge to refine predictions, which are merged with the SLM outputs. The approach introduces candidate relation type selection to keep prompts compact and a four-mode merge strategy to fuse SLM and LLM predictions, with Mode 3 often yielding the strongest gains especially for tail relations. Experiments on the ancient-Chinese ChisRE dataset show that SLCoLM provides substantial improvements in zero-shot and ICL settings over pure LLM or SLM baselines, highlighting effective cross-model knowledge transfer and enhanced handling of long-tail relations in a low-resource domain.

Abstract

Recently, large language models (LLMs) have been successful in relational extraction (RE) tasks, especially in the few-shot learning. An important problem in the field of RE is long-tailed data, while not much attention is paid to this problem using LLM approaches. Therefore, in this paper, we propose SLCoLM, a model collaboration framework, to mitigate the data long-tail problem. In our framework, we use the ``\textit{Training-Guide-Predict}'' strategy to combine the strengths of small pre-trained language models (SLMs) and LLMs, where a task-specific SLM framework acts as a guider, transfers task knowledge to the LLM and guides the LLM in performing RE tasks. Our experiments on an ancient Chinese RE dataset rich in relation types show that the approach facilitates RE of long-tail relation types.
Paper Structure (17 sections, 3 figures, 9 tables, 1 algorithm)

This paper contains 17 sections, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: Model Collaboration Mechanism Illustration. "Definition" represents the definition of relation types.
  • Figure 2: Comparison of experimental results between the Spert and the LLMs in the SLCoLM framework (a, b), and comparison of experimental results after fusion of the Spert with the LLM in the SLCoLM framework (c, d). "GPT(SLCoLM, Zero-shot) with Merge mode 3" denotes the fusion of results from GPT-3.5 and Spert in the SLCoLM framework using merge mode 3.
  • Figure 3: Percentage of each relation type. "<100" means these relation types with fewer than 100 samples.