Table of Contents
Fetching ...

KnowsLM: A framework for evaluation of small language models for knowledge augmentation and humanised conversations

Chitranshu Harbola, Anupam Purwar

TL;DR

The paper tackles the challenge of enabling knowledge-rich, human-like dialogue in small- and mid-sized language models. It introduces KnowSLM, a framework that separates knowledge generation, augmentation (LoRA fine-tuning and RAG), and evaluation using LLM judges. Through experiments on LLaMA 3.3 70B with synthetic Delhi-food and other knowledge sources, it shows that RAG improves factual accuracy on unseen prompts, while LoRA-based fine-tuning better preserves tone and conciseness, though gains from small datasets are limited and costly to scale. The study further analyzes the impact of LoRA rank $r \in \{4,6,8,32\}$ and dataset size on performance, highlighting important trade-offs between knowledge integration, stylistic alignment, and resource use. Overall, KnowSLM provides a practical blueprint for deploying knowledge-augmented small/medium LMs in resource-constrained settings.

Abstract

In the evolving landscape of conversational AI, generating concise, context-aware, and human-like dialogue using small and medium-sized language models (LLMs) remains a complex challenge. This study investigates the influence of LoRA rank, dataset scale, and prompt prefix design on both knowledge retention and stylistic alignment. While fine-tuning improves fluency and enables stylistic customization, its ability to integrate unseen knowledge is constrained -- particularly with smaller datasets. Conversely, RAG-augmented models, equipped to incorporate external documents at inference, demonstrated superior factual accuracy on out-of-distribution prompts, though they lacked the stylistic consistency achieved by fine-tuning. Evaluations by LLM-based judges across knowledge accuracy, conversational quality, and conciseness suggest that fine-tuning is best suited for tone adaptation, whereas RAG excels at real-time knowledge augmentation.

KnowsLM: A framework for evaluation of small language models for knowledge augmentation and humanised conversations

TL;DR

The paper tackles the challenge of enabling knowledge-rich, human-like dialogue in small- and mid-sized language models. It introduces KnowSLM, a framework that separates knowledge generation, augmentation (LoRA fine-tuning and RAG), and evaluation using LLM judges. Through experiments on LLaMA 3.3 70B with synthetic Delhi-food and other knowledge sources, it shows that RAG improves factual accuracy on unseen prompts, while LoRA-based fine-tuning better preserves tone and conciseness, though gains from small datasets are limited and costly to scale. The study further analyzes the impact of LoRA rank and dataset size on performance, highlighting important trade-offs between knowledge integration, stylistic alignment, and resource use. Overall, KnowSLM provides a practical blueprint for deploying knowledge-augmented small/medium LMs in resource-constrained settings.

Abstract

In the evolving landscape of conversational AI, generating concise, context-aware, and human-like dialogue using small and medium-sized language models (LLMs) remains a complex challenge. This study investigates the influence of LoRA rank, dataset scale, and prompt prefix design on both knowledge retention and stylistic alignment. While fine-tuning improves fluency and enables stylistic customization, its ability to integrate unseen knowledge is constrained -- particularly with smaller datasets. Conversely, RAG-augmented models, equipped to incorporate external documents at inference, demonstrated superior factual accuracy on out-of-distribution prompts, though they lacked the stylistic consistency achieved by fine-tuning. Evaluations by LLM-based judges across knowledge accuracy, conversational quality, and conciseness suggest that fine-tuning is best suited for tone adaptation, whereas RAG excels at real-time knowledge augmentation.

Paper Structure

This paper contains 10 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: KnowSLM Framework: proposed methodology consisting of knowledge generation, knowledge augmentation (fine tuning and RAG) and augmented LM evaluation.
  • Figure 2: Synthetic dialogue generation pipeline
  • Figure 3: Win Chart: Impact of Synthetic Dialogue Fine-Tuning on Unseen Data Knowledge Recall in LLaMA
  • Figure 4: Win Chart: This figure illustrates the comparative performance of a fine-tuned LLaMA 3.3 70B model and a RAG-augmented variant on unseen knowledge-intensive queries.
  • Figure 5: Win Chart: Impact of LoRA Rank on Knowledge Infusion and Response Quality in LLaMA 3.3 70B Fine-Tuning
  • ...and 3 more figures