Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

Lasal Jayawardena; Prasan Yapa

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

Lasal Jayawardena, Prasan Yapa

TL;DR

Paraphrase generation with large language models offers high quality but impractical resource demands. The authors implement a data-centric, sequence-level knowledge distillation pipeline to transfer paraphrase capabilities from a teacher LLM (ChatGPT) to three compact models via LoRA-based parameter-efficient fine-tuning, achieving diverse paraphrases with comparable quality and substantially faster inference ($10^3$-fold reduction in size). Quantitative and qualitative evaluations show the distilled models retain strong semantic similarity and syntactic/lexical diversity, with only a small performance drop relative to the teacher (about 4%), as confirmed by human and GPT-4 assessments. This approach enables cost-effective, scalable paraphrase generation and demonstrates the viability of deploying compact, diverse paraphrase models in production settings.

Abstract

Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks.

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

TL;DR

-fold reduction in size). Quantitative and qualitative evaluations show the distilled models retain strong semantic similarity and syntactic/lexical diversity, with only a small performance drop relative to the teacher (about 4%), as confirmed by human and GPT-4 assessments. This approach enables cost-effective, scalable paraphrase generation and demonstrates the viability of deploying compact, diverse paraphrase models in production settings.

Abstract

Paper Structure (22 sections, 3 figures, 5 tables)

This paper contains 22 sections, 3 figures, 5 tables.

Introduction
Related Work
Paraphrase Datasets
Paraphrase Generation
Knowledge Distillation
Methodology
Dataset Creation
Model Training
Model Inference
Evaluation
Quantitative Analysis
Semantic Simialrity
Syntactic Diversity
Lexical Diversity
Qualitative Analysis
...and 7 more sections

Figures (3)

Figure 1: High-Level Architecture Diagram for Training and Inference Phases.
Figure A1: This figure illustrates the prompt fed to the gpt-4 model for evaluation.
Figure A2: Instructions given to Human Evaluators.

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

TL;DR

Abstract

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)