TOPICAL: TOPIC Pages AutomagicaLly
John Giorgi, Amanpreet Singh, Doug Downey, Sergey Feldman, Lucy Lu Wang
TL;DR
TOPICAL addresses the problem of rapidly producing high-quality, citable topic pages for biomedical concepts by integrating retrieval-augmented generation with PubMed-backed literature mining. The method retrieves up to 10,000 papers, embeds and clusters them with SPECTER2, samples a diverse subset, and prompts GPT-4 to generate concise topic pages with inline citations. In extensive human evaluations on 150 biomedical terms, the system achieved strong relevance, accuracy, and coherence scores, with robust citation quality. The work culminates in an openly accessible web app and open-source code, enabling researchers to generate on-demand topic pages and aiding literature navigation amid biomedical information overload.
Abstract
Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org
