RA-MTR: A Retrieval Augmented Multi-Task Reader based Approach for Inspirational Quote Extraction from Long Documents
Sayantan Adak, Animesh Mukherjee
TL;DR
The paper tackles context-based quote extraction from long documents by casting it as open-QA and proposing RA-MtR, a retrieval-augmented multi-task reader. It combines a vector-store retriever with a Llama-3 re-ranker and a dual-head reader that performs quotable tagging and context-aware span prediction, achieving notable improvements over baselines and strong few-shot generalization. Three diverse datasets (QuoteR, Gandhi, Quotus) are curated to support cross-genre evaluation and public release. The approach demonstrates robust performance, detailed ablation and analysis, and practical deployment potential, enabling accurate extraction of meaningful quotes from lengthy texts. Overall, RA-MtR advances quotable-phrase extraction for journalism, essays, and archival analysis by effectively integrating retrieval, re-ranking, and multi-task span reasoning.
Abstract
Inspirational quotes from famous individuals are often used to convey thoughts in news articles, essays, and everyday conversations. In this paper, we propose a novel context-based quote extraction system that aims to extract the most relevant quote from a long text. We formulate this quote extraction as an open domain question answering problem first by employing a vector-store based retriever and then applying a multi-task reader. We curate three context-based quote extraction datasets and introduce a novel multi-task framework RA-MTR that improves the state-of-the-art performance, achieving a maximum improvement of 5.08% in BoW F1-score.
