Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

Ashley Shin; Qiao Jin; James Anibal; Zhiyong Lu

Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

Ashley Shin, Qiao Jin, James Anibal, Zhiyong Lu

TL;DR

The paper tackles the lack of explainability in literature recommendations by transforming PubMed user query logs into PubCLogs, a dataset of coclicked seed–similar article pairs with ground-truth tokens derived from query counts. It introduces HSAT, a transformer-based sequence tagging model that highlights relevant tokens in similar article titles conditioned on the seed article's title and abstract. Across a PubCLogs holdout test set and a manually annotated test set, HSAT outperforms strong baselines including BM25, MPNet, MedCPT, and GPT-4 in token-level F1, while user studies favor HSAT for conciseness and informativeness. The work demonstrates that repurposing user intelligence from scholarly search logs can yield practical, scalable explanations for recommendations, potentially improving researchers’ literature search efficiency.

Abstract

Searching for a related article based on a reference article is an integral part of scientific research. PubMed, like many academic search engines, has a "similar articles" feature that recommends articles relevant to the current article viewed by a user. Explaining recommended items can be of great utility to users, particularly in the literature search process. With more than a million biomedical papers being published each year, explaining the recommended similar articles would facilitate researchers and clinicians in searching for related articles. Nonetheless, the majority of current literature recommendation systems lack explanations for their suggestions. We employ a post hoc approach to explaining recommendations by identifying relevant tokens in the titles of similar articles. Our major contribution is building PubCLogs by repurposing 5.6 million pairs of coclicked articles from PubMed's user query logs. Using our PubCLogs dataset, we train the Highlight Similar Article Title (HSAT), a transformer-based model designed to select the most relevant parts of the title of a similar article, based on the title and abstract of a seed article. HSAT demonstrates strong performance in our empirical evaluations, achieving an F1 score of 91.72 percent on the PubCLogs test set, considerably outperforming several baselines including BM25 (70.62), MPNet (67.11), MedCPT (62.22), GPT-3.5 (46.00), and GPT-4 (64.89). Additional evaluations on a separate, manually annotated test set further verifies HSAT's performance. Moreover, participants of our user study indicate a preference for HSAT, due to its superior balance between conciseness and comprehensiveness. Our study suggests that repurposing user query logs of academic search engines can be a promising way to train state-of-the-art models for explaining literature recommendation.

Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

TL;DR

Abstract

Paper Structure (28 sections, 4 equations, 5 figures, 3 tables)

This paper contains 28 sections, 4 equations, 5 figures, 3 tables.

Introduction
Related work
Post hoc model of explainability
Production academic search engines
Sequence tagging
Method
Building PubCLogs
Preprocessing
Dataset analysis
Training HSAT
Setup
Inference
Evaluation
Baselines
HighlightAll.
...and 13 more sections

Figures (5)

Figure 1: Outline of our model, Highlight Similar Article Title (HSAT). Given the title and abstract of a seed article, and the title of a "similar article," HSAT selects the most relevant tokens in the similar article title to be highlighted.
Figure 2: Overview of our novel method to utilize PubMed user query logs for building our dataset, PubCLogs. When a user issues a query and clicks on an article from the initial search results, then views the title and abstract of the clicked article, returns to the search page, and subsequently clicks on another article, we hypothesize that the coclicked articles are likely related, as the user chose the second article after reviewing the content of the first. Thus, pairs of coclicked articles form the foundation for constructing PubCLogs.
Figure 3: Overview of the PubCLogs dataset construction process: for each coclicked article pair, the initial article represents the seed article, and the related article clicked subsequently represents the similar article. For each token in the title of the similar article, we aggregate the number of coclicks from queries that included the title token. We apply a softmax function to these click counts and establish a predefined threshold, P, to identify the most frequently queried similar article title tokens, which are then used as the ground truth labels for the PMID pair.
Figure 4: $\mathrm{F_1}$ scores of our model, Highlight Similar Article Title (HSAT) on subsets of the PubCLogs test set, divided based on article pair semantic similarity using MedCPT.
Figure 5: Results of our user preference studies of HSAT outputs versus GPT-4 outputs. The neutral column indicates that both outputs were similar in quality.

Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

TL;DR

Abstract

Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

Authors

TL;DR

Abstract

Table of Contents

Figures (5)