Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation
Connor Kilrain, David Carlyn, Julia Chae, Sara Beery, Wei-Lun Chao, Jianyang Gu
TL;DR
The paper addresses how to evaluate identity preservation in subject-driven personalized generation, arguing that existing similarity-based metrics miss fine-grained identity details. It proposes Finer-Personalization Rank, a gallery-based retrieval protocol where a generated image is used to rank real images from a fine-grained gallery, with mean average precision capturing identity retention across category and instance levels. Experiments on CUB, Stanford Cars, and Animal Re-ID show substantial identity drift in popular personalization methods when evaluated with the proposed protocol, and demonstrate that specialized encoders improve detection of identity-specific details. The protocol is presented as a complementary, cost-efficient tool for developing and validating personalized generation systems with real-world user identity requirements.
Abstract
The rise of personalized generative models raises a central question: how should we evaluate identity preservation? Given a reference image (e.g., one's pet), we expect the generated image to retain precise details attached to the subject's identity. However, current generative evaluation metrics emphasize the overall semantic similarity between the reference and the output, and overlook these fine-grained discriminative details. We introduce Finer-Personalization Rank, an evaluation protocol tailored to identity preservation. Instead of pairwise similarity, Finer-Personalization Rank adopts a ranking view: it treats each generated image as a query against an identity-labeled gallery consisting of visually similar real images. Retrieval metrics (e.g., mean average precision) measure performance, where higher scores indicate that identity-specific details (e.g., a distinctive head spot) are preserved. We assess identity at multiple granularities -- from fine-grained categories (e.g., bird species, car models) to individual instances (e.g., re-identification). Across CUB, Stanford Cars, and animal Re-ID benchmarks, Finer-Personalization Rank more faithfully reflects identity retention than semantic-only metrics and reveals substantial identity drift in several popular personalization methods. These results position the gallery-based protocol as a principled and practical evaluation for personalized generation.
