Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!
Divya Patel, Pathik Patel, Ankush Chander, Sourish Dasgupta, Tanmoy Chakraborty
TL;DR
This work defines ICPL for summarization and introduces iCOPERNICUS, a prompt-based probing framework that tests three cues—examples, reading history, and contrastive prompts—to determine if LLMs can truly personalize summaries. Using the PENS dataset and 17 SOTA LLM variants, the study finds widespread ICPL degradation under richer prompts, and shows that EGISES alone can be misleading for assessing personalization. The authors validate paradoxes with adversarial probes and human judgments, demonstrating that only a subset of models pass the full ICPL evaluation. The results underscore the need for robust ICPL-aware evaluation and future improvements in personalized summarization methods for LLMs.
Abstract
Large Language Models (LLMs) have succeeded considerably in In-Context-Learning (ICL) based summarization. However, saliency is subject to the users' specific preference histories. Hence, we need reliable In-Context Personalization Learning (ICPL) capabilities within such LLMs. For any arbitrary LLM to exhibit ICPL, it needs to have the ability to discern contrast in user profiles. A recent study proposed a measure for degree-of-personalization called EGISES for the first time. EGISES measures a model's responsiveness to user profile differences. However, it cannot test if a model utilizes all three types of cues provided in ICPL prompts: (i) example summaries, (ii) user's reading histories, and (iii) contrast in user profiles. To address this, we propose the iCOPERNICUS framework, a novel In-COntext PERsonalization learNIng sCrUtiny of Summarization capability in LLMs that uses EGISES as a comparative measure. As a case-study, we evaluate 17 state-of-the-art LLMs based on their reported ICL performances and observe that 15 models' ICPL degrades (min: 1.6%; max: 3.6%) when probed with richer prompts, thereby showing lack of true ICPL.
