Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias
Mingxiao Li, Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
TL;DR
This work tackles overfitting and evaluation bias in personalized image generation by introducing a Background Attractor to separate subject and background, and by curating PDST, a dedicated test set for unbiased automatic evaluation. The approach couples a latent diffusion framework with Textual Inversion and NeTI, augmented by a contrastive loss and a background-specific attractor, to improve subject fidelity while preserving versatility across prompts. Key contributions include the attractor-based learning pipeline, the PDST benchmark for robust evaluation, and comprehensive ablations showing the importance of loss weighting and background disentanglement. Practically, the method yields more reliable automatic metrics and higher-quality, text-aligned personalizations, facilitating safer and more effective real-world deployment.
Abstract
Personalized image generation via text prompts has great potential to improve daily life and professional work by facilitating the creation of customized visual content. The aim of image personalization is to create images based on a user-provided subject while maintaining both consistency of the subject and flexibility to accommodate various textual descriptions of that subject. However, current methods face challenges in ensuring fidelity to the text prompt while not overfitting to the training data. In this work, we introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images, allowing the model to focus on learning an effective representation of the personalized subject. Moreover, current evaluation methods struggle due to the lack of a dedicated test set. The evaluation set-up typically relies on the training data of the personalization task to compute text-image and image-image similarity scores, which, while useful, tend to overestimate performance. Although human evaluations are commonly used as an alternative, they often suffer from bias and inconsistency. To address these issues, we curate a diverse and high-quality test set with well-designed prompts. With this new benchmark, automatic evaluation metrics can reliably assess model performance
