QUILL: Quotation Generation Enhancement of Large Language Models

Jin Xiao; Bowei Zhang; Qianyu He; Jiaqing Liang; Feng Wei; Jinglei Chen; Zujie Liang; Deqing Yang; Yanghua Xiao

QUILL: Quotation Generation Enhancement of Large Language Models

Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao

TL;DR

This work tackles quotation generation in large language models by identifying pervasive issues of quotation hallucination, contextual misalignment, and limited novelty. It proposes QUILL, a framework that combines a five-criterion automatic evaluation, a large bilingual knowledge base of 32,022 quotes, and a quotation-specific reranking metric to improve retrieval-augmented generation for QR tasks. The main contributions include a holistic evaluation system, a rigorously curated multilingual quotation corpus, and a fine-grained reranking mechanism that correlates strongly with human preferences and enhances performance across open- and closed-source models. The approach reduces quotation hallucination, strengthens authenticity and credibility of inserted quotes, and provides publicly available data and code to advance research and practical deployment in QG systems.

Abstract

While Large language models (LLMs) have become excellent writing assistants, they still struggle with quotation generation. This is because they either hallucinate when providing factual quotations or fail to provide quotes that exceed human expectations. To bridge the gap, we systematically study how to evaluate and improve LLMs' performance in quotation generation tasks. We first establish a holistic and automatic evaluation system for quotation generation task, which consists of five criteria each with corresponding automatic metric. To improve the LLMs' quotation generation abilities, we construct a bilingual knowledge base that is broad in scope and rich in dimensions, containing up to 32,022 quotes. Moreover, guided by our critiria, we further design a quotation-specific metric to rerank the retrieved quotations from the knowledge base. Extensive experiments show that our metrics strongly correlate with human preferences. Existing LLMs struggle to generate desired quotes, but our quotation knowledge base and reranking metric help narrow this gap. Our dataset and code are publicly available at https://github.com/GraceXiaoo/QUILL.

QUILL: Quotation Generation Enhancement of Large Language Models

TL;DR

Abstract

QUILL: Quotation Generation Enhancement of Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)