Personalized Generation In Large Model Era: A Survey
Yiyan Xu, Jinghao Zhang, Alireza Salemi, Xinting Hu, Wenjie Wang, Fuli Feng, Hamed Zamani, Xiangnan He, Tat-Seng Chua
TL;DR
This survey introduces Personalized Generation (PGen) as a cross-modal paradigm that tailors content to individual users by leveraging a unified, user-centric framework. It formalizes PGen with a two-stage workflow (user modeling and multimodal generation) and a three-part optimization pipeline, then presents a multi-level taxonomy across text, image, video, 3D, audio, and cross-modal modalities, including representative tasks, datasets, and evaluation metrics. The paper highlights applications in content creation and content delivery, and discusses open technical challenges, benchmarks, and trustworthiness concerns. By synthesizing research across NLP, CV, and IR, the survey provides a structured resource to foster cross-community collaboration and guide future developments in a more personalized digital ecosystem.
Abstract
In the era of large models, content generation is gradually shifting to Personalized Generation (PGen), tailoring content to individual preferences and needs. This paper presents the first comprehensive survey on PGen, investigating existing research in this rapidly growing field. We conceptualize PGen from a unified perspective, systematically formalizing its key components, core objectives, and abstract workflows. Based on this unified perspective, we propose a multi-level taxonomy, offering an in-depth review of technical advancements, commonly used datasets, and evaluation metrics across multiple modalities, personalized contexts, and tasks. Moreover, we envision the potential applications of PGen and highlight open challenges and promising directions for future exploration. By bridging PGen research across multiple modalities, this survey serves as a valuable resource for fostering knowledge sharing and interdisciplinary collaboration, ultimately contributing to a more personalized digital landscape.
