Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning
Shenghua Wang, Zhen Yin
TL;DR
The paper tackles the challenge of finding relevant open-access scientific literature under privacy constraints by proposing OMRC-MR, a discourse-aware, content-based recommendation framework. It introduces QA-style OMRC summarization to convert papers into structured Objective, Method, Result, and Conclusion views, and combines this with multi-level contrastive learning and structure-aware re-ranking to produce interpretable, cross-disciplinary, and multilingual representations. The method shows consistent improvements over baselines across DBLP, S2ORC, and the Sci-OMRC dataset, achieving up to 7.2% gains in Precision@10 and 3.8% in Recall@10, while QA-style summaries yield more coherent and factually grounded representations. By avoiding reliance on user interaction data or citation graphs, OMRC-MR advances privacy-preserving scholarly information retrieval with robust cross-language and cross-domain performance, and points to future extensions in broader discourse decomposition and transfer learning.
Abstract
The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.
