Table of Contents
Fetching ...

PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models

Lemei Zhang, Peng Liu, Marcus Tiedemann Oekland Henriksboe, Even W. Lauvrak, Jon Atle Gulla, Heri Ramampiaro

TL;DR

This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs, and preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.

Abstract

With the rapid advancement of Natural Language Processing in recent years, numerous studies have shown that generic summaries generated by Large Language Models (LLMs) can sometimes surpass those annotated by experts, such as journalists, according to human evaluations. However, there is limited research on whether these generic summaries meet the individual needs of ordinary people. The biggest obstacle is the lack of human-annotated datasets from the general public. Existing work on personalized summarization often relies on pseudo datasets created from generic summarization datasets or controllable tasks that focus on specific named entities or other aspects, such as the length and specificity of generated summaries, collected from hypothetical tasks without the annotators' initiative. To bridge this gap, we propose a high-quality, personalized, manually annotated abstractive summarization dataset called PersonalSum. This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs. It includes user profiles, personalized summaries accompanied by source sentences from given articles, and machine-generated generic summaries along with their sources. We investigate several personal signals - entities/topics, plot, and structure of articles - that may affect the generation of personalized summaries using LLMs in a few-shot in-context learning scenario. Our preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.

PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models

TL;DR

This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs, and preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.

Abstract

With the rapid advancement of Natural Language Processing in recent years, numerous studies have shown that generic summaries generated by Large Language Models (LLMs) can sometimes surpass those annotated by experts, such as journalists, according to human evaluations. However, there is limited research on whether these generic summaries meet the individual needs of ordinary people. The biggest obstacle is the lack of human-annotated datasets from the general public. Existing work on personalized summarization often relies on pseudo datasets created from generic summarization datasets or controllable tasks that focus on specific named entities or other aspects, such as the length and specificity of generated summaries, collected from hypothetical tasks without the annotators' initiative. To bridge this gap, we propose a high-quality, personalized, manually annotated abstractive summarization dataset called PersonalSum. This dataset is the first to investigate whether the focus of public readers differs from the generic summaries generated by LLMs. It includes user profiles, personalized summaries accompanied by source sentences from given articles, and machine-generated generic summaries along with their sources. We investigate several personal signals - entities/topics, plot, and structure of articles - that may affect the generation of personalized summaries using LLMs in a few-shot in-context learning scenario. Our preliminary results and analysis indicate that entities/topics are merely one of the key factors that impact the diverse preferences of users, and personalized summarization remains a significant challenge for existing LLMs.
Paper Structure (20 sections, 14 figures, 9 tables)

This paper contains 20 sections, 14 figures, 9 tables.

Figures (14)

  • Figure 1: (a)-(d) shows annotator demographics, including gender, age, reading habits, and occupation. (e)-(g) cover annotation categories and counts. (h) and (i) display the distribution of qualified summaries per worker and time spent per annotation.
  • Figure 2: (a) The distribution of sources of machine-generated summaries and human-annotated personalized summaries. (b) The distribution of average words per machine-generated summary and human-annotated summary.
  • Figure 3: Experimental results showing improvements in the ROUGE-1 score from personalized prompting compared to generic summaries using GPT-3.5 Turbo for each worker. The X-axis represents worker IDs, and the Y-axis represents the ROUGE-1 score improvements.
  • Figure 4: (a) The plot information concerned in the 5-shot historical annotated summaries of Worker 3, the generic summary, and the summary with the prompt including the annotator's plot information. (b) The article's plot data is extracted by GPT-4o. For clarity, we only include the original information relevant to the generated summaries for Worker 3. (c) The entities that appear in the 5-shot historical annotations of Worker 1, the user-annotated summary, the generic summary, and the summary with the prompt including the annotator's entity details. All generated summaries are from GPT-3.5-Turbo.
  • Figure 5: Prompt template using GPT-4o on (a) Plot extraction, and (b) Named Entity (NE) recognition.
  • ...and 9 more figures