Table of Contents
Fetching ...

Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs

Shivam Ratnakar, Abhiroop Talasila, Raghav Chamadiya, Nikhil Agarwal, Vinayak K Doifode

TL;DR

This work assesses Parameter-Efficient Fine-Tuning (PEFT) for embedding domain-specific facts in LLMs by introducing a BERT classifier to separate QA pairs into Conceptual and Factual classes. Two Llama-2 7B models are LoRA-tuned on these splits and evaluated against larger models (GPT-3.5 Turbo, Gemini, Prometheus), revealing that Conceptual QA data yield stronger performance and that synthetic data generated via the D-Naive approach outperforms D-RAG. The study also finds that PEFT is highly effective for instruction-based tasks but may not be optimal for embedding factual knowledge, as evidenced by a 1,000-sample data-center product-recommendation case where fine-tuned Llama-2 7B outperforms the baseline. These results highlight the importance of QA-pair quality, conceptual categorization, and synthetic data generation in domain-focused LLM fine-tuning, guiding when to apply PEFT and how to optimize data for domain adaptation.

Abstract

This paper presents an extensive examination of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain specific facts into Large Language Models (LLMs), focusing on improving the fine-tuning process by categorizing question-answer (QA) pairs into Factual and Conceptual classes using a BERT-based classifier. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini. Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets. Additionally, we compare the efficiency of two synthetic fine-tuning dataset generation techniques, D-RAG and D-Naive, with D-Naive demonstrating superior performance. Although PEFT has shown effectiveness, our research indicates that it may not be the most optimal method for embedding facts into LLMs. However, it has demonstrated exceptional performance in instruction-based tasks. Our findings are reinforced by a 1000-sample dataset in the data center domain, where the fine-tuned Llama-2 7B model significantly outperforms the baseline model in generating product recommendations. Our study highlights the importance of QA pair categorization and synthetic dataset generation techniques in enhancing the performance of LLMs in specific domains.

Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs

TL;DR

This work assesses Parameter-Efficient Fine-Tuning (PEFT) for embedding domain-specific facts in LLMs by introducing a BERT classifier to separate QA pairs into Conceptual and Factual classes. Two Llama-2 7B models are LoRA-tuned on these splits and evaluated against larger models (GPT-3.5 Turbo, Gemini, Prometheus), revealing that Conceptual QA data yield stronger performance and that synthetic data generated via the D-Naive approach outperforms D-RAG. The study also finds that PEFT is highly effective for instruction-based tasks but may not be optimal for embedding factual knowledge, as evidenced by a 1,000-sample data-center product-recommendation case where fine-tuned Llama-2 7B outperforms the baseline. These results highlight the importance of QA-pair quality, conceptual categorization, and synthetic data generation in domain-focused LLM fine-tuning, guiding when to apply PEFT and how to optimize data for domain adaptation.

Abstract

This paper presents an extensive examination of Parameter-Efficient Fine-Tuning (PEFT) for embedding domain specific facts into Large Language Models (LLMs), focusing on improving the fine-tuning process by categorizing question-answer (QA) pairs into Factual and Conceptual classes using a BERT-based classifier. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini. Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets. Additionally, we compare the efficiency of two synthetic fine-tuning dataset generation techniques, D-RAG and D-Naive, with D-Naive demonstrating superior performance. Although PEFT has shown effectiveness, our research indicates that it may not be the most optimal method for embedding facts into LLMs. However, it has demonstrated exceptional performance in instruction-based tasks. Our findings are reinforced by a 1000-sample dataset in the data center domain, where the fine-tuned Llama-2 7B model significantly outperforms the baseline model in generating product recommendations. Our study highlights the importance of QA pair categorization and synthetic dataset generation techniques in enhancing the performance of LLMs in specific domains.

Paper Structure

This paper contains 15 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Pipeline to generate D-RAG and D-Naive
  • Figure 2: Comparison of score distribution of different evaluators. Refer to Table \ref{['tab:res_table']} for empirical results
  • Figure 3: Train and Eval loss - D-RAG vs D-Naive
  • Figure 4: Train and Eval loss - Conceptual vs Factual
  • Figure 5: Train loss - Call Transcript
  • ...and 2 more figures