Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

Divya Patel; Pathik Patel; Ankush Chander; Sourish Dasgupta; Tanmoy Chakraborty

Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

Divya Patel, Pathik Patel, Ankush Chander, Sourish Dasgupta, Tanmoy Chakraborty

TL;DR

This work defines ICPL for summarization and introduces iCOPERNICUS, a prompt-based probing framework that tests three cues—examples, reading history, and contrastive prompts—to determine if LLMs can truly personalize summaries. Using the PENS dataset and 17 SOTA LLM variants, the study finds widespread ICPL degradation under richer prompts, and shows that EGISES alone can be misleading for assessing personalization. The authors validate paradoxes with adversarial probes and human judgments, demonstrating that only a subset of models pass the full ICPL evaluation. The results underscore the need for robust ICPL-aware evaluation and future improvements in personalized summarization methods for LLMs.

Abstract

Large Language Models (LLMs) have succeeded considerably in In-Context-Learning (ICL) based summarization. However, saliency is subject to the users' specific preference histories. Hence, we need reliable In-Context Personalization Learning (ICPL) capabilities within such LLMs. For any arbitrary LLM to exhibit ICPL, it needs to have the ability to discern contrast in user profiles. A recent study proposed a measure for degree-of-personalization called EGISES for the first time. EGISES measures a model's responsiveness to user profile differences. However, it cannot test if a model utilizes all three types of cues provided in ICPL prompts: (i) example summaries, (ii) user's reading histories, and (iii) contrast in user profiles. To address this, we propose the iCOPERNICUS framework, a novel In-COntext PERsonalization learNIng sCrUtiny of Summarization capability in LLMs that uses EGISES as a comparative measure. As a case-study, we evaluate 17 state-of-the-art LLMs based on their reported ICL performances and observe that 15 models' ICPL degrades (min: 1.6%; max: 3.6%) when probed with richer prompts, thereby showing lack of true ICPL.

Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

TL;DR

Abstract

Paper Structure (49 sections, 3 equations, 10 figures, 9 tables)

This paper contains 49 sections, 3 equations, 10 figures, 9 tables.

Introduction
Preliminaries
Personalization w.r.t Summarization
EGISES: Personalization Measure
In-Context Personalization Learning
The iCOPERNICUS Framework
Contrastive Probing
Probe 1: Do example summaries help?
Probe 2: Does reading-history help?
Probe 3: Do contrastive prompts help?
Limitations of EGISES w.r.t ICPL
Evaluation: Setup
Model Benchmarking Dataset
Probing Dataset Creation
Probed SOTA LLMs
...and 34 more sections

Figures (10)

Figure 1: $\mathcal{P_C}$ type: Contrastive (C)-2-shot w/ history.
Figure 2: The statistics of news corpus and training set of the PENS dataset.
Figure 3: Stages of creation of testing dataset consisting of personalized headlines
Figure 4: Prompt Templates within the iCOPERNICUS framework: Prompts on the left probe whether models utilize richer reader profiles; prompts on the right probe whether models utilize contrastive information for real personalization.
Figure 5: Website portal designed for conducting survey for collecting human judgements on the similarity between user references and model generated summaries for contrastive prompts.
...and 5 more figures

Theorems & Definitions (9)

Definition 1
Definition 2
Definition 3
Definition 4
Definition 5
Definition 6
Definition 7
Definition 8
Definition 9

Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

TL;DR

Abstract

Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (9)