SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models
Xiang Liu, Zhaoxiang Liu, Peng Wang, Kohou Wang, Huan Hu, Kai Wang, Shiguo Lian
TL;DR
The paper addresses inefficiencies in domain-specific fine-tuning when SFT data largely overlaps with an LLM's existing knowledge. It introduces SLearnLLM, a self-learning framework that identifies unknown knowledge via a self-check CoT process, scores and filters incorrect responses, and fine-tunes on the filtered subset to target weaknesses. Through experiments in agriculture and medicine using the Qwen-1.5 family and LoRA fine-tuning, the approach achieves comparable accuracy improvements to full dataset fine-tuning but with substantially reduced training time, with savings increasing for larger models. The method offers a practical path for cost-efficient domain adaptation of LLMs, emphasizing targeted learning from knowledge gaps rather than wholesale data utilization.
Abstract
When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge, the performance gains are minimal, leading to wasted computational resources. Identifying the unknown knowledge within the SFT dataset and using it to fine-tune the model could substantially improve the training efficiency. To address this challenge, we propose a self-learning framework for LLMs inspired by human learning pattern. This framework takes a fine-tuning (SFT) dataset in a specific domain as input. First, the LLMs answer the questions in the SFT dataset. The LLMs then objectively grade the responses and filter out the incorrectly answered QA pairs. Finally, we fine-tune the LLMs based on this filtered QA set. Experimental results in the fields of agriculture and medicine demonstrate that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning. By concentrating on the unknown knowledge within the SFT dataset, our approach enhances the efficiency of fine-tuning LLMs.
