CareCom: Generative Image Composition with Calibrated Reference Features
Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu
TL;DR
CareCom addresses the challenge of generative image composition with multiple foreground references by calibrating reference features to fit the background. It introduces global and local reference feature calibration modules that produce augmented features and are injected into a denoising diffusion network, enabling simultaneous detail preservation and pose/view adjustment. The method is pretrained on MVImgNet and finetuned with few-shot exemplars, and it outperforms baselines in background fidelity, pose compatibility, and overall image quality on MVImgNet and MureCom. This approach demonstrates that calibrated, multi-reference features can substantially improve realism and fidelity in foreground insertion, with practical implications for flexible image editing and content creation.
Abstract
Image composition aims to seamlessly insert foreground object into background. Despite the huge progress in generative image composition, the existing methods are still struggling with simultaneous detail preservation and foreground pose/view adjustment. To address this issue, we extend the existing generative composition model to multi-reference version, which allows using arbitrary number of foreground reference images. Furthermore, we propose to calibrate the global and local features of foreground reference images to make them compatible with the background information. The calibrated reference features can supplement the original reference features with useful global and local information of proper pose/view. Extensive experiments on MVImgNet and MureCom demonstrate that the generative model can greatly benefit from the calibrated reference features.
