MagicView: Multi-View Consistent Identity Customization via Priors-Guided In-Context Learning
Hengjia Li, Jianjin Xu, Keli Cheng, Lei Wang, Ning Bi, Boxi Wu, Fernando De la Torre, Deng Cai
TL;DR
MagicView addresses the challenge of achieving multi-view identity-consistent customization from a single photograph by introducing a 3D priors-guided in-context learning framework for DiT-based models. The method uses in-context depth maps derived from SMPL and PuLID to activate multi-view reasoning and employs a Semantic Correspondence Alignment loss to preserve semantic controllability under limited data. With only 100 training samples, MagicView achieves superior multi-view consistency, identity fidelity, and prompt alignment compared with stronger baselines, while remaining data-efficient and test-time tuning-free. The approach offers a practical pathway to high-quality, view-coherent personalized imagery and has potential extensions to 3D modeling and reconstruction. Overall, MagicView combines lightweight adaptation, 3D priors, and semantic-preserving finetuning to deliver robust, controllable multi-view identity customization.
Abstract
Recent advances in personalized generative models have demonstrated impressive capabilities in producing identity-consistent images of the same individual across diverse scenes. However, most existing methods lack explicit viewpoint control and fail to ensure multi-view consistency of generated identities. To address this limitation, we present MagicView, a lightweight adaptation framework that equips existing generative models with multi-view generation capability through 3D priors-guided in-context learning. While prior studies have shown that in-context learning preserves identity consistency across grid samples, its effectiveness in multi-view settings remains unexplored. Building upon this insight, we conduct an in-depth analysis of the multi-view in-context learning ability, and design a conditioning architecture that leverages 3D priors to activate this capability for multi-view consistent identity customization. On the other hand, acquiring robust multi-view capability typically requires large-scale multi-dimensional datasets, which makes incorporating multi-view contextual learning under limited data regimes prone to textual controllability degradation. To address this issue, we introduce a novel Semantic Correspondence Alignment loss, which effectively preserves semantic alignment while maintaining multi-view consistency. Extensive experiments demonstrate that MagicView substantially outperforms recent baselines in multi-view consistency, text alignment, identity similarity, and visual quality, achieving strong results with only 100 multi-view training samples.
