Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman
TL;DR
Omni-ID presents a fixed-size, structured facial identity encoding designed for generative tasks by aggregating multiple images of an individual into a single representation. It combines a transformer-based Omni-ID Encoder with a few-to-many identity reconstruction paradigm and a dual-decoder setup (Masked Transformer Decoder and Flow-Matching) to capture holistic identity features across poses and expressions. Trained on the MFHQ dataset, Omni-ID demonstrates superior identity fidelity in controllable face generation and personalized text-to-image generation compared with discriminative baselines like ArcFace and CLIP. The approach offers scalable identity-preserving generation and opens avenues for richer, subject-specific synthesis, while highlighting areas for extension to non-facial attributes and broader dataset enhancements.
Abstract
We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions. A multi-decoder framework is further employed to leverage the complementary strengths of diverse decoders during training. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.
