Personalized Representation from Personalized Generation
Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola
TL;DR
The paper addresses learning personalized visual representations from only a few real images by leveraging synthetic data generated through diffusion-based personalization. It proposes a three-stage pipeline that personalizes a generator (via DreamBooth), synthesizes diverse target-specific images, and fine-tunes a general encoder with a contrastive objective, evaluated on recognition, retrieval, detection, and segmentation. Empirical results across DF2, Dogs, and PODS show consistent gains over pretrained representations, with data-generation strategy and prompt design (including CFG and LLM-generated captions) significantly impacting performance. The work contributes a new dataset (PODS), reformulations of existing datasets, and practical insights for data-efficient, private personalization, including an integration path with PerSAM for dense tasks.
Abstract
Modern vision models excel at general purpose downstream tasks. It is unclear, however, how they may be used for personalized vision tasks, which are both fine-grained and data-scarce. Recent works have successfully applied synthetic data to general-purpose representation learning, while advances in T2I diffusion models have enabled the generation of personalized images from just a few real examples. Here, we explore a potential connection between these ideas, and formalize the challenge of using personalized synthetic data to learn personalized representations, which encode knowledge about an object of interest and may be flexibly applied to any downstream task relating to the target object. We introduce an evaluation suite for this challenge, including reformulations of two existing datasets and a novel dataset explicitly constructed for this purpose, and propose a contrastive learning approach that makes creative use of image generators. We show that our method improves personalized representation learning for diverse downstream tasks, from recognition to segmentation, and analyze characteristics of image generation approaches that are key to this gain.
