Table of Contents
Fetching ...

CRE-LLM: A Domain-Specific Chinese Relation Extraction Framework with Fine-tuned Large Language Model

Zhengpeng Shi, Haoran Luo

TL;DR

This work tackles domain-specific Chinese relation extraction (DSCRE) where data are scarce and fine-tuning large models is costly. It introduces CRE-LLM, an end-to-end framework that fine-tunes open-source LLMs using instruction supervision and parameter-efficient fine-tuning (PEFT, e.g., LoRA) to generate relation triplets directly from text. Experiments on FinRE and SanWen show state-of-the-art performance on FinRE and robust results on SanWen, with substantial efficiency advantages over full-model fine-tuning. The approach offers a practical, scalable path for deploying DSCRE with open-source LLMs and PEFT, enabling stronger domain-specific semantic understanding with reduced resource requirements.

Abstract

Domain-Specific Chinese Relation Extraction (DSCRE) aims to extract relations between entities from domain-specific Chinese text. Despite the rapid development of PLMs in recent years, especially LLMs, DSCRE still faces three core challenges: complex network structure design, poor awareness, and high consumption of fine-tuning. Given the impressive performance of large language models (LLMs) in natural language processing, we propose a new framework called CRE-LLM. This framework is based on fine-tuning open-source LLMs, such as Llama-2, ChatGLM2, and Baichuan2. CRE-LLM enhances the logic-awareness and generative capabilities of the model by constructing an appropriate prompt and utilizing open-source LLMs for instruction-supervised fine-tuning. And then it directly extracts the relations of the given entities in the input textual data, which improving the CRE approach. To demonstrate the effectiveness of the proposed framework, we conducted extensive experiments on two domain-specific CRE datasets, FinRE and SanWen. The experimental results show that CRE-LLM is significantly superior and robust, achieving state-of-the-art (SOTA) performance on the FinRE dataset. This paper introduces a novel approach to domain-specific relation extraction (DSCRE) tasks that are semantically more complex by combining LLMs with triples. Our code is publicly available.

CRE-LLM: A Domain-Specific Chinese Relation Extraction Framework with Fine-tuned Large Language Model

TL;DR

This work tackles domain-specific Chinese relation extraction (DSCRE) where data are scarce and fine-tuning large models is costly. It introduces CRE-LLM, an end-to-end framework that fine-tunes open-source LLMs using instruction supervision and parameter-efficient fine-tuning (PEFT, e.g., LoRA) to generate relation triplets directly from text. Experiments on FinRE and SanWen show state-of-the-art performance on FinRE and robust results on SanWen, with substantial efficiency advantages over full-model fine-tuning. The approach offers a practical, scalable path for deploying DSCRE with open-source LLMs and PEFT, enabling stronger domain-specific semantic understanding with reduced resource requirements.

Abstract

Domain-Specific Chinese Relation Extraction (DSCRE) aims to extract relations between entities from domain-specific Chinese text. Despite the rapid development of PLMs in recent years, especially LLMs, DSCRE still faces three core challenges: complex network structure design, poor awareness, and high consumption of fine-tuning. Given the impressive performance of large language models (LLMs) in natural language processing, we propose a new framework called CRE-LLM. This framework is based on fine-tuning open-source LLMs, such as Llama-2, ChatGLM2, and Baichuan2. CRE-LLM enhances the logic-awareness and generative capabilities of the model by constructing an appropriate prompt and utilizing open-source LLMs for instruction-supervised fine-tuning. And then it directly extracts the relations of the given entities in the input textual data, which improving the CRE approach. To demonstrate the effectiveness of the proposed framework, we conducted extensive experiments on two domain-specific CRE datasets, FinRE and SanWen. The experimental results show that CRE-LLM is significantly superior and robust, achieving state-of-the-art (SOTA) performance on the FinRE dataset. This paper introduces a novel approach to domain-specific relation extraction (DSCRE) tasks that are semantically more complex by combining LLMs with triples. Our code is publicly available.
Paper Structure (22 sections, 3 equations, 3 figures, 5 tables)

This paper contains 22 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: An example of Domain-specific CRE Task.
  • Figure 2: Illustration of 4 different paradigms for solving CRE task. As shown in Figure 2a, entities and texts from the RE datasets are inputted separately into the PLM. And the PLM is combined with the Relation Set and output the relation with the highest probability as result. As shown in Figure 2b, prompts are constructed based on the texts and Relation Set from the RE dataset and input them into the LLM to generate relation. As shown in Figure 2c, the RE dataset is employed to construct the prompts and input them into the LLM to generate preliminary results, which are subsequently retrieved with the Relation Set to obtain relation extraction results. As shown in Figure 2d, our method directly utilizes a fine-tuning dataset constructed from the RE dataset to fine-tune the LLM and then generate accurate relation extraction results.
  • Figure 3: The overview of CRE-LLM for domain-specific Chinese relation extraction method with supervised fine-tuned LLMs by using Parameter-Efficient Fine-Tuning (PEFT) technologies (e.g. LoRA).