Table of Contents
Fetching ...

Iterative Critique-Refine Framework for Enhancing LLM Personalization

Durga Prasad Maram, Dhruvin Gandhi, Zonghai Yao, Gayathri Akkinapalli, Franck Dernoncourt, Yu Wang, Ryan A. Rossi, Nesreen K. Ahmed

TL;DR

PerFine presents a training-free, profile-grounded critique–refine loop for personalized text generation. It couples GraphRAG-based retrieval with a generator and a profile-conditioned critic to iteratively align output to a user’s style and topical focus, using a knockout/Best-of-N inference strategy to select strong drafts. Evaluated on Yelp, Goodreads, and Amazon via METEOR and GEval, PerFine shows consistent improvements over strong RAG baselines across 3–5 refinement rounds, with larger critics yielding further gains. The approach remains model-agnostic and training-free, offering practical, scalable personalization with clear trade-offs between quality and efficiency. Limitations include a fixed iteration budget and opportunities to optimize retrieval timing and evaluation for nuanced user preferences.

Abstract

Personalized text generation requires models not only to produce coherent text but also to align with a target user's style, tone, and topical focus. Existing retrieval-augmented approaches such as LaMP and PGraphRAG enrich profiles with user and neighbor histories, but they stop at generation and often yield outputs that drift in tone, topic, or style. We present PerFine, a unified, training-free critique-refine framework that enhances personalization through iterative, profile-grounded feedback. In each iteration, an LLM generator produces a draft conditioned on the retrieved profile, and a critic LLM - also conditioned on the same profile - provides structured feedback on tone, vocabulary, sentence structure, and topicality. The generator then revises, while a novel knockout strategy retains the stronger draft across iterations. We further study additional inference-time strategies such as Best-of-N and Topic Extraction to balance quality and efficiency. Across Yelp, Goodreads, and Amazon datasets, PerFine consistently improves personalization over PGraphRAG, with GEval gains of +7-13%, steady improvements over 3-5 refinement iterations, and scalability with increasing critic size. These results highlight that post-hoc, profile-aware feedback offers a powerful paradigm for personalized LLM generation that is both training-free and model-agnostic.

Iterative Critique-Refine Framework for Enhancing LLM Personalization

TL;DR

PerFine presents a training-free, profile-grounded critique–refine loop for personalized text generation. It couples GraphRAG-based retrieval with a generator and a profile-conditioned critic to iteratively align output to a user’s style and topical focus, using a knockout/Best-of-N inference strategy to select strong drafts. Evaluated on Yelp, Goodreads, and Amazon via METEOR and GEval, PerFine shows consistent improvements over strong RAG baselines across 3–5 refinement rounds, with larger critics yielding further gains. The approach remains model-agnostic and training-free, offering practical, scalable personalization with clear trade-offs between quality and efficiency. Limitations include a fixed iteration budget and opportunities to optimize retrieval timing and evaluation for nuanced user preferences.

Abstract

Personalized text generation requires models not only to produce coherent text but also to align with a target user's style, tone, and topical focus. Existing retrieval-augmented approaches such as LaMP and PGraphRAG enrich profiles with user and neighbor histories, but they stop at generation and often yield outputs that drift in tone, topic, or style. We present PerFine, a unified, training-free critique-refine framework that enhances personalization through iterative, profile-grounded feedback. In each iteration, an LLM generator produces a draft conditioned on the retrieved profile, and a critic LLM - also conditioned on the same profile - provides structured feedback on tone, vocabulary, sentence structure, and topicality. The generator then revises, while a novel knockout strategy retains the stronger draft across iterations. We further study additional inference-time strategies such as Best-of-N and Topic Extraction to balance quality and efficiency. Across Yelp, Goodreads, and Amazon datasets, PerFine consistently improves personalization over PGraphRAG, with GEval gains of +7-13%, steady improvements over 3-5 refinement iterations, and scalability with increasing critic size. These results highlight that post-hoc, profile-aware feedback offers a powerful paradigm for personalized LLM generation that is both training-free and model-agnostic.

Paper Structure

This paper contains 27 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of our framework for personalized text generation. User profile information is retrieved to guide the generator, whose outputs are iteratively critiqued and refined by PerFine, enabling multi-round personalization.
  • Figure 2: Performance across iterations on Yelp, Goodreads, and Amazon datasets. PerFine+Knockout starts from the PGraphRAG baseline and exhibits steady improvements, with gains plateauing after a few iterations.
  • Figure 3: Vizualization of the critic's token usage (prompt + completion) vs normalized GEval performance on the Amazon, Goodreads, and Yelp datasets. Notably, PerFine+Knockout improves performance, while PerFine+Knockout+Best-of-N achieves the highest scores, with increased token cost. Considering both efficiency and effectiveness, we ultimately select PerFine+Knockout.
  • Figure 4: Figure showing generator's token usage (prompt + completion) vs normalized GEval performance on the Amazon, Goodreads, and Yelp datasets. While the token usage for PerFine and PerFine+Knockout is similar, the token footprint increases for PerFine+Knockout+Best-of-N due to the sampling of multiple revisions, while yielding only a marginal improvement in performance.