Table of Contents
Fetching ...

Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning

Shenghua Wang, Zhen Yin

TL;DR

The paper tackles the challenge of finding relevant open-access scientific literature under privacy constraints by proposing OMRC-MR, a discourse-aware, content-based recommendation framework. It introduces QA-style OMRC summarization to convert papers into structured Objective, Method, Result, and Conclusion views, and combines this with multi-level contrastive learning and structure-aware re-ranking to produce interpretable, cross-disciplinary, and multilingual representations. The method shows consistent improvements over baselines across DBLP, S2ORC, and the Sci-OMRC dataset, achieving up to 7.2% gains in Precision@10 and 3.8% in Recall@10, while QA-style summaries yield more coherent and factually grounded representations. By avoiding reliance on user interaction data or citation graphs, OMRC-MR advances privacy-preserving scholarly information retrieval with robust cross-language and cross-domain performance, and points to future extensions in broader discourse decomposition and transfer learning.

Abstract

The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.

Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning

TL;DR

The paper tackles the challenge of finding relevant open-access scientific literature under privacy constraints by proposing OMRC-MR, a discourse-aware, content-based recommendation framework. It introduces QA-style OMRC summarization to convert papers into structured Objective, Method, Result, and Conclusion views, and combines this with multi-level contrastive learning and structure-aware re-ranking to produce interpretable, cross-disciplinary, and multilingual representations. The method shows consistent improvements over baselines across DBLP, S2ORC, and the Sci-OMRC dataset, achieving up to 7.2% gains in Precision@10 and 3.8% in Recall@10, while QA-style summaries yield more coherent and factually grounded representations. By avoiding reliance on user interaction data or citation graphs, OMRC-MR advances privacy-preserving scholarly information retrieval with robust cross-language and cross-domain performance, and points to future extensions in broader discourse decomposition and transfer learning.

Abstract

The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.

Paper Structure

This paper contains 15 sections, 9 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Conceptual overview of the proposed QA-style OMRC framework for cross-disciplinary and multilingual scholarly recommendation.
  • Figure 2: Overall architecture of the proposed framework for cross-disciplinary and multilingual scholarly recommendation.
  • Figure 3: Performance variation under different weighting balances in joint contrastive learning, peaking at $\alpha$ = 0.4 and $\beta$ = 0.6.
  • Figure 4: Retrieval performance under different re-ranking weights ($\lambda$), showing peak precision and ranking quality at $\lambda$ = 0.6, where metadata and role-aware similarities are optimally balanced.
  • Figure 5: Cross-lingual Precision@10 comparison on the Sci-OMRC dataset.
  • ...and 2 more figures