OSInsert: Towards High-authenticity and High-fidelity Image Composition

Jingyuan Wang; Li Niu

OSInsert: Towards High-authenticity and High-fidelity Image Composition

Jingyuan Wang, Li Niu

TL;DR

This work tackles the trade-off between authenticity and fidelity in generative image composition. It introduces OSInsert, a two-stage framework that decouples authenticity (Stage 1) and fidelity (Stage 2) by first generating a background-compatible foreground with ObjectStitch and a SAM-derived mask, then filling details with InsertAnything guided by the mask and the reference image. On the MureCOM dataset, OSInsert achieves simultaneous background alignment and high-detail preservation, outperforming single-stage baselines and demonstrating practical gains for real-world image composition. The approach leverages the strengths of diffusion-based inpainting and in-context editing to provide a modular, scalable solution with open-source code and models available.

Abstract

Generative image composition aims to regenerate the given foreground object in the background image to produce a realistic composite image. Some high-authenticity methods can adjust foreground pose/view to be compatible with background, while some high-fidelity methods can preserve the foreground details accurately. However, existing methods can hardly achieve both goals at the same time. In this work, we propose a two-stage strategy to achieve both goals. In the first stage, we use high-authenticity method to generate reasonable foreground shape, serving as the condition of high-fidelity method in the second stage. The experiments on MureCOM dataset verify the effectiveness of our two-stage strategy. The code and model have been released at https://github.com/bcmi/OSInsert-Image-Composition.

OSInsert: Towards High-authenticity and High-fidelity Image Composition

TL;DR

Abstract

Paper Structure (15 sections, 5 equations, 2 figures)

This paper contains 15 sections, 5 equations, 2 figures.

Introduction
Related Works
High-authenticity Image Composition
High-fidelity Image Composition
Method
The First Stage: Authenticity Generation
Masked Background Image Construction
Intermediate Composite Image Generation
Foreground Mask Extraction
The Second Stage: Fidelity Filling
Experiment
Dataset
Baseline Methods
Experimental Results
Conclusion

Figures (2)

Figure 1: Illustration of our two-stage pipeline. In the first stage, we use ObjectStitch objectstitch to generate the composite image with reasonable foreground pose/viewpoint and extract the foreground region. In the second stage, we use InsertAnything song2025insert to fill in the foreground region with the appearance details of reference image.
Figure 2: The visualization results of different generative composition methods on MureCom lu2023dreamcom dataset. From left to right in each row, we show the background with foreground bounding box, five reference images of the same foreground object, the generated results of ObjectStitch objectstitch, InsertAnything song2025insert, our OSInsert, Banana pro team2024gemini, and Seedream 5.0 seedream2025seedream.

OSInsert: Towards High-authenticity and High-fidelity Image Composition

TL;DR

Abstract

OSInsert: Towards High-authenticity and High-fidelity Image Composition

Authors

TL;DR

Abstract

Table of Contents

Figures (2)