Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?
Xue-Yong Fu, Md Tahmid Rahman Laskar, Elena Khasanova, Cheng Chen, Shashi Bhushan TN
TL;DR
The paper investigates whether compact, instruction-following LLMs can match the performance of larger LLMs for real-world meeting summarization while reducing deployment costs. Through a thorough comparison of zero-shot large models (e.g., GPT-3.5, PaLM-2, LLaMA-2, Mixtral) and fine-tuned compact models (notably FLAN-T5-Large, 780M), the study finds that most small models underperform relative to larger zero-shot LLMs, with FLAN-T5-Large often achieving comparable or superior results. Using two datasets—an in-domain business meeting corpus and a GPT-4-referenced QMSUM-I—the authors show that FLAN-T5-Large provides a favorable balance of accuracy and efficiency, particularly on shorter transcripts, and that context length and instruction type significantly influence performance. The work also analyzes real-world deployment aspects, including inference latency and GPU requirements, concluding that FLAN-T5-Large offers a cost-effective path for production use, while longer transcripts may benefit from alternative strategies like chapterization. Overall, the paper contributes a comprehensive, cost-aware evaluation framework for selecting compact LLMs in meeting summarization tasks and highlights practical guidelines for deploying efficient NLP systems in industry.
Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.
