Table of Contents
Fetching ...

TOPICAL: TOPIC Pages AutomagicaLly

John Giorgi, Amanpreet Singh, Doug Downey, Sergey Feldman, Lucy Lu Wang

TL;DR

TOPICAL addresses the problem of rapidly producing high-quality, citable topic pages for biomedical concepts by integrating retrieval-augmented generation with PubMed-backed literature mining. The method retrieves up to 10,000 papers, embeds and clusters them with SPECTER2, samples a diverse subset, and prompts GPT-4 to generate concise topic pages with inline citations. In extensive human evaluations on 150 biomedical terms, the system achieved strong relevance, accuracy, and coherence scores, with robust citation quality. The work culminates in an openly accessible web app and open-source code, enabling researchers to generate on-demand topic pages and aiding literature navigation amid biomedical information overload.

Abstract

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org

TOPICAL: TOPIC Pages AutomagicaLly

TL;DR

TOPICAL addresses the problem of rapidly producing high-quality, citable topic pages for biomedical concepts by integrating retrieval-augmented generation with PubMed-backed literature mining. The method retrieves up to 10,000 papers, embeds and clusters them with SPECTER2, samples a diverse subset, and prompts GPT-4 to generate concise topic pages with inline citations. In extensive human evaluations on 150 biomedical terms, the system achieved strong relevance, accuracy, and coherence scores, with robust citation quality. The work culminates in an openly accessible web app and open-source code, enabling researchers to generate on-demand topic pages and aiding literature navigation amid biomedical information overload.

Abstract

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org
Paper Structure (24 sections, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Example of a scientific topic page generated by our system. Citations are provided as hyperlinks to PubMed articles and denoted by their PMID. The topic page is divided into the definition statement, main content, and future directions and open research questions.
  • Figure 2: Overview of TOPICAL. Given a biomedical entity, we query PubMed for relevant literature (A). The titles and abstracts of the results are embedded with SPECTER singh-etal-2023-scirepeval and clustered based on semantic similarity (B). We sample titles and abstracts from the clusters (C) and feed them to GPT-4 OpenAI2023GPT4TR, alongside publication metadata and natural language instructions, to generate the topic page (D).
  • Figure 3: Example clusters. Three titles from a selection of clusters for each concept are shown. Emphasis ours.
  • Figure 4: Truncated example prompt. The prompt is divided into system and user roles. In the user role, we provide instructions about the input, how to cite a claim, details about the entity or concept like publication metadata, the sampled literature, and guidance about the expected sections and lengths for the topic page. Emphasis is provided for visualization purposes only.
  • Figure 5: TOPICAL web app. Given a search query for a biomedical entity or concept of interest and a canonicalized name, it automatically generates a topic page for the concept. An expandable section provides additional information, like a histogram of publication dates for the query and the number of clusters identified.
  • ...and 2 more figures