Table of Contents
Fetching ...

MureObjectStitch: Multi-reference Image Composition

Jiaxuan Chen, Bo Zhang, Qingdong He, Jinlong Peng, Li Niu

TL;DR

This work tackles the trade-off between foreground detail fidelity and pose/viewpoint adjustment in generative image composition by introducing a multi-reference finetuning strategy built on ObjectStitch. The approach finetunes a pretrained diffusion-based model using multiple reference images of the same object, integrating their features via cross-attention in the U-Net to enable simultaneous detail preservation and pose control. Evaluations on the MureCom dataset show that multi-reference finetuning improves foreground fidelity without sacrificing alignment to the background, demonstrating practical gains for realistic composites. The method offers flexible usage with any number of reference images and provides code and models for reproducibility and further research.

Abstract

Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time. In this work, we propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object. Moreover, we propose a multi-reference strategy, which allows the model to take in multiple reference images of the foreground object. The experiments on MureCOM dataset verify the effectiveness of our method. The code and model have been released at https://github.com/bcmi/MureObjectStitch-Image-Composition.

MureObjectStitch: Multi-reference Image Composition

TL;DR

This work tackles the trade-off between foreground detail fidelity and pose/viewpoint adjustment in generative image composition by introducing a multi-reference finetuning strategy built on ObjectStitch. The approach finetunes a pretrained diffusion-based model using multiple reference images of the same object, integrating their features via cross-attention in the U-Net to enable simultaneous detail preservation and pose control. Evaluations on the MureCom dataset show that multi-reference finetuning improves foreground fidelity without sacrificing alignment to the background, demonstrating practical gains for realistic composites. The method offers flexible usage with any number of reference images and provides code and models for reproducibility and further research.

Abstract

Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. The existing methods are struggling to preserve the foreground details and adjust the foreground pose/viewpoint at the same time. In this work, we propose an effective finetuning strategy for generative image composition model, in which we finetune a pretrained model using one or more images containing the same foreground object. Moreover, we propose a multi-reference strategy, which allows the model to take in multiple reference images of the foreground object. The experiments on MureCOM dataset verify the effectiveness of our method. The code and model have been released at https://github.com/bcmi/MureObjectStitch-Image-Composition.

Paper Structure

This paper contains 6 sections, 5 figures.

Figures (5)

  • Figure 1: Illustration of ground-truth images, background images, and foreground images.
  • Figure 2: Illustration of our MureObjectStitch model.
  • Figure 3: Visual comparison between pretrained ObjectStitch and our finetuned MureObjectStitch. In each example, from left to right, we show the background image with specified foreground placement, 5 reference images of foreground object, and 5 results using different random seeds. The results in odd rows are obtained using the pretrained ObjectStitch, and the results in even rows are obtained using the finetuned MureObjectStitch.
  • Figure 4: Visual results of our finetuned MureObjectStitch. In each example, from left to right, we show the background image with specified foreground placement, one example reference image of foreground object, and 5 results using different random seeds.
  • Figure 5: Visual results of our finetuned MureObjectStitch. In each example, from left to right, we show the background image with specified foreground placement, one example reference image of foreground object, and 5 results using different random seeds.