Table of Contents
Fetching ...

Research Paper Recommender System by Considering Users' Information Seeking Behaviors

Zhelin Xu, Shuhei Yamamoto, Hideo Joho

TL;DR

The paper addresses information overload in scientific literature search by moving beyond global content similarity to a section-aware content-based filtering approach. It learns a paper representation from both the overall abstract content and weighted signals from the background, method, and results sections, with weights determined by a learned attention mechanism and title augmentation. The model uses SPECTER embeddings, a multi-head attention module, and a nonlinear MLP, trained with a triplet loss that emphasizes hard negatives, achieving state-of-the-art results on the DBLP dataset (MAP ≈ 0.808 and recall@5 ≈ 0.813). This section-aware representation improves relevance and ranking, offering practical benefits for novice researchers and scalable literature discovery, with future work aiming to validate weights via user studies and extend to larger datasets.

Abstract

With the rapid growth of scientific publications, researchers need to spend more time and effort searching for papers that align with their research interests. To address this challenge, paper recommendation systems have been developed to help researchers in effectively identifying relevant paper. One of the leading approaches to paper recommendation is content-based filtering method. Traditional content-based filtering methods recommend relevant papers to users based on the overall similarity of papers. However, these approaches do not take into account the information seeking behaviors that users commonly employ when searching for literature. Such behaviors include not only evaluating the overall similarity among papers, but also focusing on specific sections, such as the method section, to ensure that the approach aligns with the user's interests. In this paper, we propose a content-based filtering recommendation method that takes this information seeking behavior into account. Specifically, in addition to considering the overall content of a paper, our approach also takes into account three specific sections (background, method, and results) and assigns weights to them to better reflect user preferences. We conduct offline evaluations on the publicly available DBLP dataset, and the results demonstrate that the proposed method outperforms six baseline methods in terms of precision, recall, F1-score, MRR, and MAP.

Research Paper Recommender System by Considering Users' Information Seeking Behaviors

TL;DR

The paper addresses information overload in scientific literature search by moving beyond global content similarity to a section-aware content-based filtering approach. It learns a paper representation from both the overall abstract content and weighted signals from the background, method, and results sections, with weights determined by a learned attention mechanism and title augmentation. The model uses SPECTER embeddings, a multi-head attention module, and a nonlinear MLP, trained with a triplet loss that emphasizes hard negatives, achieving state-of-the-art results on the DBLP dataset (MAP ≈ 0.808 and recall@5 ≈ 0.813). This section-aware representation improves relevance and ranking, offering practical benefits for novice researchers and scalable literature discovery, with future work aiming to validate weights via user studies and extend to larger datasets.

Abstract

With the rapid growth of scientific publications, researchers need to spend more time and effort searching for papers that align with their research interests. To address this challenge, paper recommendation systems have been developed to help researchers in effectively identifying relevant paper. One of the leading approaches to paper recommendation is content-based filtering method. Traditional content-based filtering methods recommend relevant papers to users based on the overall similarity of papers. However, these approaches do not take into account the information seeking behaviors that users commonly employ when searching for literature. Such behaviors include not only evaluating the overall similarity among papers, but also focusing on specific sections, such as the method section, to ensure that the approach aligns with the user's interests. In this paper, we propose a content-based filtering recommendation method that takes this information seeking behavior into account. Specifically, in addition to considering the overall content of a paper, our approach also takes into account three specific sections (background, method, and results) and assigns weights to them to better reflect user preferences. We conduct offline evaluations on the publicly available DBLP dataset, and the results demonstrate that the proposed method outperforms six baseline methods in terms of precision, recall, F1-score, MRR, and MAP.

Paper Structure

This paper contains 22 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of the paper recommendation task based on content-based filtering.
  • Figure 2: Overview of Proposed Method. C: Classification model, this model is used to extract three specific sections from the query paper's abstract. Title: the title of the query paper, which is then appended to each extracted section. E: Embedding model, this model is used to encode each section or the full abstract into vector representations. A: Attention model, which assigns different weights to the extracted section embeddings. M: An non-linear MLP model.
  • Figure 3: Positive and hard-negative sample selection for a query paper based on citation relationships.
  • Figure 4: The evaluation results of precision@N, recall@N and F1-score@N on the paper recommendation task.
  • Figure 5: An example of the first recommendation made by our method and SPECTER. The query paper is shown on the left. The paper in the center is recommended by the proposed method and is a relevant paper. The paper on the right is recommended by SPECTER and is an irrelevant paper. Green indicates the background section, red represents the method section, and blue highlights the results section.