Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
Peng Xing, Ning Wang, Jianbo Ouyang, Zechao Li
TL;DR
Inv-Adapter tackles ID customization by extracting diffusion-domain representations of a prompt image via DDIM inversion and injecting them into a pre-trained text-to-image model with a lightweight Embedded Attention Adapter. By abandoning extra image encoders and training only the 48M-parameter EAA, it achieves high identity fidelity and generation quality while maintaining efficiency. Quantitative and qualitative results on CelebA-HQ/FFHQ-derived benchmarks show strong face fidelity (FACE-SIM, CLIP-I, DINO) and solid loyalty with reduced model scale compared to prior methods. The approach offers practical deployment benefits and shows promise for broader use with diffusion-based personalization, though it highlights data diversity and inversion-speed limitations as avenues for future improvement.
Abstract
The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high fidelity and high-efficiency requirements. Their main bottleneck lies in the prompt image encoder, which produces weak alignment signals with the text-to-image model and significantly increased model size. Towards this end, we propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model by carefully designing a lightweight attention adapter. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
