MureObjectStitch: Multi-reference Image Composition
Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu
TL;DR
This work tackles the trade-off between foreground detail fidelity and pose/viewpoint adjustment in generative image composition by introducing a multi-reference finetuning strategy built on ObjectStitch. The approach finetunes a pretrained diffusion-based model using multiple reference images of the same object, integrating their features via cross-attention in the U-Net to enable simultaneous detail preservation and pose control. Evaluations on the MureCom dataset show that multi-reference finetuning improves foreground fidelity without sacrificing alignment to the background, demonstrating practical gains for realistic composites. The method offers flexible usage with any number of reference images and provides code and models for reproducibility and further research.
Abstract
Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time. In this work, we propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object. Moreover, we propose a multi-reference strategy, which allows the model to take in multiple reference images of the foreground object. The experiments on MureCOM dataset verify the effectiveness of our method. The code and model have been released at https://github.com/bcmi/MureObjectStitch-Image-Composition.
