Table of Contents
Fetching ...

RA-MTR: A Retrieval Augmented Multi-Task Reader based Approach for Inspirational Quote Extraction from Long Documents

Sayantan Adak, Animesh Mukherjee

TL;DR

The paper tackles context-based quote extraction from long documents by casting it as open-QA and proposing RA-MtR, a retrieval-augmented multi-task reader. It combines a vector-store retriever with a Llama-3 re-ranker and a dual-head reader that performs quotable tagging and context-aware span prediction, achieving notable improvements over baselines and strong few-shot generalization. Three diverse datasets (QuoteR, Gandhi, Quotus) are curated to support cross-genre evaluation and public release. The approach demonstrates robust performance, detailed ablation and analysis, and practical deployment potential, enabling accurate extraction of meaningful quotes from lengthy texts. Overall, RA-MtR advances quotable-phrase extraction for journalism, essays, and archival analysis by effectively integrating retrieval, re-ranking, and multi-task span reasoning.

Abstract

Inspirational quotes from famous individuals are often used to convey thoughts in news articles, essays, and everyday conversations. In this paper, we propose a novel context-based quote extraction system that aims to extract the most relevant quote from a long text. We formulate this quote extraction as an open domain question answering problem first by employing a vector-store based retriever and then applying a multi-task reader. We curate three context-based quote extraction datasets and introduce a novel multi-task framework RA-MTR that improves the state-of-the-art performance, achieving a maximum improvement of 5.08% in BoW F1-score.

RA-MTR: A Retrieval Augmented Multi-Task Reader based Approach for Inspirational Quote Extraction from Long Documents

TL;DR

The paper tackles context-based quote extraction from long documents by casting it as open-QA and proposing RA-MtR, a retrieval-augmented multi-task reader. It combines a vector-store retriever with a Llama-3 re-ranker and a dual-head reader that performs quotable tagging and context-aware span prediction, achieving notable improvements over baselines and strong few-shot generalization. Three diverse datasets (QuoteR, Gandhi, Quotus) are curated to support cross-genre evaluation and public release. The approach demonstrates robust performance, detailed ablation and analysis, and practical deployment potential, enabling accurate extraction of meaningful quotes from lengthy texts. Overall, RA-MtR advances quotable-phrase extraction for journalism, essays, and archival analysis by effectively integrating retrieval, re-ranking, and multi-task span reasoning.

Abstract

Inspirational quotes from famous individuals are often used to convey thoughts in news articles, essays, and everyday conversations. In this paper, we propose a novel context-based quote extraction system that aims to extract the most relevant quote from a long text. We formulate this quote extraction as an open domain question answering problem first by employing a vector-store based retriever and then applying a multi-task reader. We curate three context-based quote extraction datasets and introduce a novel multi-task framework RA-MTR that improves the state-of-the-art performance, achieving a maximum improvement of 5.08% in BoW F1-score.

Paper Structure

This paper contains 32 sections, 2 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Example use-case of context-aware quote extraction from source document while composing an article. The highlighted portion from the source document can be a suitable quote for the target context in the left.
  • Figure 2: Most prominent words present in the quotes across the three datasets.
  • Figure 3: The RA-MtR architecture.
  • Figure 4: Example of BIO tagging.
  • Figure 5: Average Jaccard similarity between top predicted chunk and positive paragraph for a specific context.
  • ...and 5 more figures