Table of Contents
Fetching ...

ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes

Jian Gao, Mengqi Yuan, Yifei Zeng, Chang Zeng, Zhihao Li, Zhenyu Chen, Weichao Qiu, Xiao-Xiao Long, Hao Zhu, Xun Cao, Yao Yao

TL;DR

This paper tackles realistic 3D object–scene composition in Gaussian Splatting by separating relightable object reconstruction from scene lighting estimation. It introduces Surface Octahedral Probes (SOPs) to store indirect lighting and occlusion, enabling fast, interpolation-based shading without per-iteration ray tracing. Lighting estimation is simplified to environment-map completion at the object placement site using a 360° radiance sweep and a fine-tuned diffusion model, producing coherent shadows in complex scenes. The ComGS framework delivers around 28 FPS rendering with ~36 seconds of editing, validated on SynCom and real-world captures, and achieves higher harmony and visual realism than prior approaches while substantially improving reconstruction efficiency. These advances bring practical, immersive 3D object insertion closer to real-time usage in complex environments.

Abstract

Gaussian Splatting (GS) enables immersive rendering, but realistic 3D object-scene composition remains challenging. Baked appearance and shadow information in GS radiance fields cause inconsistencies when combining objects and scenes. Addressing this requires relightable object reconstruction and scene lighting estimation. For relightable object reconstruction, existing Gaussian-based inverse rendering methods often rely on ray tracing, leading to low efficiency. We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. SOPs provide at least a 2x speedup in reconstruction and enable real-time shadow computation in Gaussian scenes. For lighting estimation, existing Gaussian-based inverse rendering methods struggle to model intricate light transport and often fail in complex scenes, while learning-based methods predict lighting from a single image and are viewpoint-sensitive. We observe that 3D object-scene composition primarily concerns the object's appearance and nearby shadows. Thus, we simplify the challenging task of full scene lighting estimation by focusing on the environment lighting at the object's placement. Specifically, we capture a 360 degrees reconstructed radiance field of the scene at the location and fine-tune a diffusion model to complete the lighting. Building on these advances, we propose ComGS, a novel 3D object-scene composition framework. Our method achieves high-quality, real-time rendering at around 28 FPS, produces visually harmonious results with vivid shadows, and requires only 36 seconds for editing. Code and dataset are available at https://nju-3dv.github.io/projects/ComGS/.

ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes

TL;DR

This paper tackles realistic 3D object–scene composition in Gaussian Splatting by separating relightable object reconstruction from scene lighting estimation. It introduces Surface Octahedral Probes (SOPs) to store indirect lighting and occlusion, enabling fast, interpolation-based shading without per-iteration ray tracing. Lighting estimation is simplified to environment-map completion at the object placement site using a 360° radiance sweep and a fine-tuned diffusion model, producing coherent shadows in complex scenes. The ComGS framework delivers around 28 FPS rendering with ~36 seconds of editing, validated on SynCom and real-world captures, and achieves higher harmony and visual realism than prior approaches while substantially improving reconstruction efficiency. These advances bring practical, immersive 3D object insertion closer to real-time usage in complex environments.

Abstract

Gaussian Splatting (GS) enables immersive rendering, but realistic 3D object-scene composition remains challenging. Baked appearance and shadow information in GS radiance fields cause inconsistencies when combining objects and scenes. Addressing this requires relightable object reconstruction and scene lighting estimation. For relightable object reconstruction, existing Gaussian-based inverse rendering methods often rely on ray tracing, leading to low efficiency. We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. SOPs provide at least a 2x speedup in reconstruction and enable real-time shadow computation in Gaussian scenes. For lighting estimation, existing Gaussian-based inverse rendering methods struggle to model intricate light transport and often fail in complex scenes, while learning-based methods predict lighting from a single image and are viewpoint-sensitive. We observe that 3D object-scene composition primarily concerns the object's appearance and nearby shadows. Thus, we simplify the challenging task of full scene lighting estimation by focusing on the environment lighting at the object's placement. Specifically, we capture a 360 degrees reconstructed radiance field of the scene at the location and fine-tune a diffusion model to complete the lighting. Building on these advances, we propose ComGS, a novel 3D object-scene composition framework. Our method achieves high-quality, real-time rendering at around 28 FPS, produces visually harmonious results with vivid shadows, and requires only 36 seconds for editing. Code and dataset are available at https://nju-3dv.github.io/projects/ComGS/.

Paper Structure

This paper contains 54 sections, 18 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Realistic 3D Object–Scene Composition Pipeline. Our approach consists of 3 stages: reconstruction (Sec. \ref{['sec:reconstruction']}), where we reconstruct the Gaussian scene and relightable Gaussian object from multi-view images; editing (Sec. \ref{['sec:editing']}), where we estimate scene lighting and cache occlusion using Surface Octahedral Probes; and rendering (Sec. \ref{['sec:rendering']}), where we perform splatting, object relighting, shadow casting, and depth compositing. The pipeline achieves visually harmonious results with realistic shadows and near-real-time performance.
  • Figure 2: Inverse Rendering with Surface Octahedral Probes (SOPs). We utilize trained relightable 2D Gaussians to generate GBuffers via splatting, followed by deferred physically based rendering for a render image. Illumination is split into direct lighting from environment map, indirect lighting and occlusion captured by textures in SOPs. Both the environment map and textures are stored as octahedral textures. Low-discrepancy ray sampling is used to compute illumination at shading point, with indirect light and occlusion derived via KNN interpolation from nearby probes. SOPs are initialized with ray tracing and optimized under its guidance, avoiding intensive ray tracing per optimization iteration and boosting inverse rendering efficiency.
  • Figure 3: Lighting Estimation. At a given location, we create a partial panoramic view via a 360° sweep of the Gaussian scene, yielding an incomplete RGB image, normal map, and alpha mask of reconstructed areas. Then, we use a fine-tuned Stable Diffusion to infer a HDR environment map.
  • Figure 4: Material Editing in real-world composition, from copper to silver and red matte. Please ZOOM IN for details.
  • Figure 5: Environment Maps Comparison. GS-IR and IRGS fail in complex scenes, DiffusionLight is viewpoint-inconsistent, while our method yields superior and consistent results.
  • ...and 8 more figures