Table of Contents
Fetching ...

Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models

Weijia Zhang, Jia-Hong Huang, Svitlana Vakulenko, Yumo Xu, Thilina Rajapakse, Evangelos Kanoulas

TL;DR

This work tackles query-focused summarization when relevant documents may be unavailable by reframing it as a knowledge-intensive task that retrieves evidence from a large knowledge corpus and uses a specialized summarization controller to generate query-aligned summaries. It introduces a two-component KI-QFS architecture (a retrieval module and a summarization controller) and a dedicated KI-QFS dataset with human relevance annotations built atop DUC data and three knowledge corpora. Experiments show the approach yields superior retrieval and summarization performance, especially in open-domain or highly specialized queries, highlighting the practicality of KI-QFS beyond traditional setups. The work also provides insights into the challenges of large-scale retrieval grounding and sets benchmarks for future retrieval-grounded, query-focused summarization research.

Abstract

Query-focused summarization (QFS) is a fundamental task in natural language processing with broad applications, including search engines and report generation. However, traditional approaches assume the availability of relevant documents, which may not always hold in practical scenarios, especially in highly specialized topics. To address this limitation, we propose a novel knowledge-intensive approach that reframes QFS as a knowledge-intensive task setup. This approach comprises two main components: a retrieval module and a summarization controller. The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus based on the given textual query, eliminating the dependence on pre-existing document sets. The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt, ensuring the generated summary is comprehensive and relevant to the query. To assess the effectiveness of our approach, we create a new dataset, along with human-annotated relevance labels, to facilitate comprehensive evaluation covering both retrieval and summarization performance. Extensive experiments demonstrate the superior performance of our approach, particularly its ability to generate accurate summaries without relying on the availability of relevant documents initially. This underscores our method's versatility and practical applicability across diverse query scenarios.

Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models

TL;DR

This work tackles query-focused summarization when relevant documents may be unavailable by reframing it as a knowledge-intensive task that retrieves evidence from a large knowledge corpus and uses a specialized summarization controller to generate query-aligned summaries. It introduces a two-component KI-QFS architecture (a retrieval module and a summarization controller) and a dedicated KI-QFS dataset with human relevance annotations built atop DUC data and three knowledge corpora. Experiments show the approach yields superior retrieval and summarization performance, especially in open-domain or highly specialized queries, highlighting the practicality of KI-QFS beyond traditional setups. The work also provides insights into the challenges of large-scale retrieval grounding and sets benchmarks for future retrieval-grounded, query-focused summarization research.

Abstract

Query-focused summarization (QFS) is a fundamental task in natural language processing with broad applications, including search engines and report generation. However, traditional approaches assume the availability of relevant documents, which may not always hold in practical scenarios, especially in highly specialized topics. To address this limitation, we propose a novel knowledge-intensive approach that reframes QFS as a knowledge-intensive task setup. This approach comprises two main components: a retrieval module and a summarization controller. The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus based on the given textual query, eliminating the dependence on pre-existing document sets. The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt, ensuring the generated summary is comprehensive and relevant to the query. To assess the effectiveness of our approach, we create a new dataset, along with human-annotated relevance labels, to facilitate comprehensive evaluation covering both retrieval and summarization performance. Extensive experiments demonstrate the superior performance of our approach, particularly its ability to generate accurate summaries without relying on the availability of relevant documents initially. This underscores our method's versatility and practical applicability across diverse query scenarios.
Paper Structure (33 sections, 1 figure, 3 tables)

This paper contains 33 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The Comparison between (a) the conventional approach and (b) our knowledge-intensive approach. The conventional one assumes relevant documents are available. We aim to retrieve such documents from a large knowledge corpus.