Table of Contents
Fetching ...

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Yuxuan Wang, Xuanyu Yi, Qingshan Xu, Yuan Zhou, Long Chen, Hanwang Zhang

TL;DR

This work tackles consistent personalization of 3D Gaussian Splatting from a single reference image by addressing viewpoint bias through a coarse-to-fine pipeline. It combines a coarse guidance stage using a pre-trained image-to-3D model, iterative LoRA fine-tuning to propagate reference appearance across views, and a view-consistent generation stage with a Flow Transformer and epipolar-aware token replacement. The approach yields superior referential and multi-view consistency, validated by qualitative and quantitative experiments against state-of-the-art baselines and supported by ablations. The method enables realistic, customizable 3D scene edits from minimal input, with practical implications for interactive content creation and personalized assets.

Abstract

Personalizing 3D scenes from a single reference image enables intuitive user-guided editing, which requires achieving both multi-view consistency across perspectives and referential consistency with the input image. However, these goals are particularly challenging due to the viewpoint bias caused by the limited perspective provided in a single image. Lacking the mechanisms to effectively expand reference information beyond the original view, existing methods of image-conditioned 3DGS personalization often suffer from this viewpoint bias and struggle to produce consistent results. Therefore, in this paper, we present Consistent Personalization for 3D Gaussian Splatting (CP-GS), a framework that progressively propagates the single-view reference appearance to novel perspectives. In particular, CP-GS integrates pre-trained image-to-3D generation and iterative LoRA fine-tuning to extract and extend the reference appearance, and finally produces faithful multi-view guidance images and the personalized 3DGS outputs through a view-consistent generation process guided by geometric cues. Extensive experiments on real-world scenes show that our CP-GS effectively mitigates the viewpoint bias, achieving high-quality personalization that significantly outperforms existing methods.

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

TL;DR

This work tackles consistent personalization of 3D Gaussian Splatting from a single reference image by addressing viewpoint bias through a coarse-to-fine pipeline. It combines a coarse guidance stage using a pre-trained image-to-3D model, iterative LoRA fine-tuning to propagate reference appearance across views, and a view-consistent generation stage with a Flow Transformer and epipolar-aware token replacement. The approach yields superior referential and multi-view consistency, validated by qualitative and quantitative experiments against state-of-the-art baselines and supported by ablations. The method enables realistic, customizable 3D scene edits from minimal input, with practical implications for interactive content creation and personalized assets.

Abstract

Personalizing 3D scenes from a single reference image enables intuitive user-guided editing, which requires achieving both multi-view consistency across perspectives and referential consistency with the input image. However, these goals are particularly challenging due to the viewpoint bias caused by the limited perspective provided in a single image. Lacking the mechanisms to effectively expand reference information beyond the original view, existing methods of image-conditioned 3DGS personalization often suffer from this viewpoint bias and struggle to produce consistent results. Therefore, in this paper, we present Consistent Personalization for 3D Gaussian Splatting (CP-GS), a framework that progressively propagates the single-view reference appearance to novel perspectives. In particular, CP-GS integrates pre-trained image-to-3D generation and iterative LoRA fine-tuning to extract and extend the reference appearance, and finally produces faithful multi-view guidance images and the personalized 3DGS outputs through a view-consistent generation process guided by geometric cues. Extensive experiments on real-world scenes show that our CP-GS effectively mitigates the viewpoint bias, achieving high-quality personalization that significantly outperforms existing methods.

Paper Structure

This paper contains 24 sections, 3 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Given a source 3DGS scene and a single reference image, $\text{CP-GS}$ enables high-quality personalization by editing a user-specified region (e.g., the bear, man’s eye, man’s face, bench-top, entire scene) to match the reference appearance, supporting replacement, adding, and style transfer.
  • Figure 2: red: Previous methods suffer from viewpoint bias and produce distorted editing guidance, leading to both the referential and multi-view inconsistencies. blue: By progressively propagating reference to novel views, $\text{CP-GS}$ mitigates the bias and achieves both consistencies in the guidance.
  • Figure 3: The pipeline of our $\text{CP-GS}$ includes three stages: coarse guidance generation via a pre-trained image-to-3D model; iterative LoRA fine-tuning to extract and propagate detailed reference appearance; and view-consistent generation of guidance images to produce final 3DGS outputs.
  • Figure 4: (a) Visualization of the translated results and the corresponding similarities under our scoring mechanism. (b) Illustration of the proposed epipolar-constrained token replacement strategy.
  • Figure 5: Additional personalization result of our $\text{CP-GS}$, demonstrating high-quality 3DGS scene customization that faithfully align with the reference image across various scenarios.
  • ...and 5 more figures