OSInsert: Towards High-authenticity and High-fidelity Image Composition
Jingyuan Wang, Li Niu
TL;DR
This work tackles the trade-off between authenticity and fidelity in generative image composition. It introduces OSInsert, a two-stage framework that decouples authenticity (Stage 1) and fidelity (Stage 2) by first generating a background-compatible foreground with ObjectStitch and a SAM-derived mask, then filling details with InsertAnything guided by the mask and the reference image. On the MureCOM dataset, OSInsert achieves simultaneous background alignment and high-detail preservation, outperforming single-stage baselines and demonstrating practical gains for real-world image composition. The approach leverages the strengths of diffusion-based inpainting and in-context editing to provide a modular, scalable solution with open-source code and models available.
Abstract
Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. Some high-authenticity methods can adjust foreground pose/view to be compatible with background, while some high-fidelity methods can preserve the foreground details accurately. However, existing methods can hardly achieve both goals at the same time. In this work, we propose a two-stage strategy to achieve both goals. In the first stage, we use high-authenticity method to generate reasonable foreground shape, serving as the condition of high-fidelity method in the second stage. The experiments on MureCOM dataset verify the effectiveness of our two-stage strategy. The code and model have been released at https://github.com/bcmi/OSInsert-Image-Composition.
