Table of Contents
Fetching ...

De-jargonizing Science for Journalists with GPT-4: A Pilot Study

Sachita Nishal, Eric Lee, Nicholas Diakopoulos

TL;DR

An initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge achieves fairly high recall and preserves relative differences in readers' jargon identification.

Abstract

This study offers an initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge. The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification, suggesting personalization as a feasible use-case for LLMs to support sense-making of complex information. Surprisingly, using only abstracts for context to generate definitions yields slightly more accurate and higher quality definitions than using RAG-based context from the fulltext of an article. The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.

De-jargonizing Science for Journalists with GPT-4: A Pilot Study

TL;DR

An initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge achieves fairly high recall and preserves relative differences in readers' jargon identification.

Abstract

This study offers an initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge. The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification, suggesting personalization as a feasible use-case for LLMs to support sense-making of complex information. Surprisingly, using only abstracts for context to generate definitions yields slightly more accurate and higher quality definitions than using RAG-based context from the fulltext of an article. The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.

Paper Structure

This paper contains 12 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Distribution of jargon terms identified per abstract by human annotators (left) and GPT-4 (right). The probability density plots illustrate how frequently each count of jargon terms is observed. While GPT-4 tends to overestimate the number of jargon terms, it still captures the relative differences between annotators.
  • Figure 2: Prototype UI displaying a search bar, filter options, and preprint abstract metadata. Users can hover over specific jargon terms, or scroll through a clickable list to see definitions. Readers can explore the interactive version of the prototype via the linked GitHub repository.