Table of Contents
Fetching ...

Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond

Tomas Goldsack, Carolina Scarton, Chenghua Lin

TL;DR

The study addresses zero-shot Lay Summarisation by introducing an eLife-inspired two-stage prompting framework that first uses a question-answering stage (where the LLM answers author-like questions) and then a summary-generation stage guided by those answers. It systematically evaluates the approach in Biomedicine and NLP, showing that larger LLMs increasingly prefer the two-stage method and that a panel of LLM evaluators can approximate human judgments. Automatic metrics alone do not fully capture quality, underscoring the value of human and LLM-based evaluation in judging lay summaries. The work demonstrates cross-domain generalization to NLP and provides practical guidance on input selection and prompt design for effective zero-shot lay summarisation.

Abstract

In this work, we explore the application of Large Language Models to zero-shot Lay Summarisation. We propose a novel two-stage framework for Lay Summarisation based on real-life processes, and find that summaries generated with this method are increasingly preferred by human judges for larger models. To help establish best practices for employing LLMs in zero-shot settings, we also assess the ability of LLMs as judges, finding that they are able to replicate the preferences of human judges. Finally, we take the initial steps towards Lay Summarisation for Natural Language Processing (NLP) articles, finding that LLMs are able to generalise to this new domain, and further highlighting the greater utility of summaries generated by our proposed approach via an in-depth human evaluation.

Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond

TL;DR

The study addresses zero-shot Lay Summarisation by introducing an eLife-inspired two-stage prompting framework that first uses a question-answering stage (where the LLM answers author-like questions) and then a summary-generation stage guided by those answers. It systematically evaluates the approach in Biomedicine and NLP, showing that larger LLMs increasingly prefer the two-stage method and that a panel of LLM evaluators can approximate human judgments. Automatic metrics alone do not fully capture quality, underscoring the value of human and LLM-based evaluation in judging lay summaries. The work demonstrates cross-domain generalization to NLP and provides practical guidance on input selection and prompt design for effective zero-shot lay summarisation.

Abstract

In this work, we explore the application of Large Language Models to zero-shot Lay Summarisation. We propose a novel two-stage framework for Lay Summarisation based on real-life processes, and find that summaries generated with this method are increasingly preferred by human judges for larger models. To help establish best practices for employing LLMs in zero-shot settings, we also assess the ability of LLMs as judges, finding that they are able to replicate the preferences of human judges. Finally, we take the initial steps towards Lay Summarisation for Natural Language Processing (NLP) articles, finding that LLMs are able to generalise to this new domain, and further highlighting the greater utility of summaries generated by our proposed approach via an in-depth human evaluation.
Paper Structure (25 sections, 2 figures, 10 tables)

This paper contains 25 sections, 2 figures, 10 tables.

Figures (2)

  • Figure 1: A Visualisation of our two-stage Lay Summarisation framework, based on the real-life process of the eLife Journal.
  • Figure 2: Human evaluation results as the proportion of answer votes for each question.