Table of Contents
Fetching ...

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

Xiang Liu, Zhaoxiang Liu, Peng Wang, Kohou Wang, Huan Hu, Kai Wang, Shiguo Lian

TL;DR

The paper addresses inefficiencies in domain-specific fine-tuning when SFT data largely overlaps with an LLM's existing knowledge. It introduces SLearnLLM, a self-learning framework that identifies unknown knowledge via a self-check CoT process, scores and filters incorrect responses, and fine-tunes on the filtered subset to target weaknesses. Through experiments in agriculture and medicine using the Qwen-1.5 family and LoRA fine-tuning, the approach achieves comparable accuracy improvements to full dataset fine-tuning but with substantially reduced training time, with savings increasing for larger models. The method offers a practical path for cost-efficient domain adaptation of LLMs, emphasizing targeted learning from knowledge gaps rather than wholesale data utilization.

Abstract

When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge, the performance gains are minimal, leading to wasted computational resources. Identifying the unknown knowledge within the SFT dataset and using it to fine-tune the model could substantially improve the training efficiency. To address this challenge, we propose a self-learning framework for LLMs inspired by human learning pattern. This framework takes a fine-tuning (SFT) dataset in a specific domain as input. First, the LLMs answer the questions in the SFT dataset. The LLMs then objectively grade the responses and filter out the incorrectly answered QA pairs. Finally, we fine-tune the LLMs based on this filtered QA set. Experimental results in the fields of agriculture and medicine demonstrate that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning. By concentrating on the unknown knowledge within the SFT dataset, our approach enhances the efficiency of fine-tuning LLMs.

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

TL;DR

The paper addresses inefficiencies in domain-specific fine-tuning when SFT data largely overlaps with an LLM's existing knowledge. It introduces SLearnLLM, a self-learning framework that identifies unknown knowledge via a self-check CoT process, scores and filters incorrect responses, and fine-tunes on the filtered subset to target weaknesses. Through experiments in agriculture and medicine using the Qwen-1.5 family and LoRA fine-tuning, the approach achieves comparable accuracy improvements to full dataset fine-tuning but with substantially reduced training time, with savings increasing for larger models. The method offers a practical path for cost-efficient domain adaptation of LLMs, emphasizing targeted learning from knowledge gaps rather than wholesale data utilization.

Abstract

When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge, the performance gains are minimal, leading to wasted computational resources. Identifying the unknown knowledge within the SFT dataset and using it to fine-tune the model could substantially improve the training efficiency. To address this challenge, we propose a self-learning framework for LLMs inspired by human learning pattern. This framework takes a fine-tuning (SFT) dataset in a specific domain as input. First, the LLMs answer the questions in the SFT dataset. The LLMs then objectively grade the responses and filter out the incorrectly answered QA pairs. Finally, we fine-tune the LLMs based on this filtered QA set. Experimental results in the fields of agriculture and medicine demonstrate that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning. By concentrating on the unknown knowledge within the SFT dataset, our approach enhances the efficiency of fine-tuning LLMs.

Paper Structure

This paper contains 12 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An efficient human learning pattern that focuses on learning unknown knowledge. This pattern facilitates knowledge acquisition and consolidation through a continuous learning cycle. In this cycle, students undergo several key steps: doing exercises, check answers, filter out errors and learn from errors. Through this iterative process, students gradually master the knowledge in the exercise set.
  • Figure 2: The framework of SLearnLLM. The framework comprises four steps: answering questions, scoring responses against the QA sets using the target LLM with robust logical reasoning, filtering out incorrect-answered questions, and fine-tuning the model based on incorrect-answered QA set to enhance performance in specific domains.
  • Figure 3: The workflow and example of the self-check process using a Chain of Thought (CoT) prompt. The input comprises a triplet: a question (highlighted in red), the correct answer (also highlighted in red), and the target LLM’s answer (highlighted in green, represented as "student" in the figure). Guided by the CoT prompt, the target LLM evaluates its own response in two stages: (1) Scoring, where the model assesses its response for consistency and accuracy, assigns a score, and provides justification; and (2) Filtering, where responses identified as incorrect are flagged with a "Yes".
  • Figure 4: Prompt for generating QA pairs for specific domains using GPT-4o. The strategy of the prompt involves three steps: 1. Questions generation. 2. Questions correction. 3. Answers generation. This strategy ensures high-quality, consistent QA pairs aligned with the original corpus content.
  • Figure 5: Examples of responses from Qwen1.5-7B-Chat and Qwen1.5-7B-Chat-SL to the same question in Agricultural and Medical domains. The responses from Qwen1.5-7B-Chat-SL are better in both domains, while the contents highlighted in green indicate errors in the responses from Qwen1.5-7B-Chat in the Medical domain.