Table of Contents
Fetching ...

LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation

Xuecan Wang, Shibang Xiao, Xiaohui Liang

TL;DR

This work tackles indoor lighting estimation from a single RGB image with 3D spatial coherence, addressing the high memory and compute costs of dense volumetric methods. It introduces a sparse voxel octree lighting representation paired with a lightweight octree-based network and a differentiable voxel-octree cone-tracing renderer to produce high-quality, spatially coherent illumination for virtual object insertion. The approach reduces storage and computation to approximately $O(n^2)$ versus $O(n^3)$ for dense grids, while supporting end-to-end training and realistic rendering. Experiments on synthetic indoor datasets demonstrate competitive illumination accuracy and superior efficiency, enabling interactive AR/MR applications with photorealistic virtual insertions and robust cross-view consistency.

Abstract

We present a lightweight solution for estimating spatially-coherent indoor lighting from a single RGB image. Previous methods for estimating illumination using volumetric representations have overlooked the sparse distribution of light sources in space, necessitating substantial memory and computational resources for achieving high-quality results. We introduce a unified, voxel octree-based illumination estimation framework to produce 3D spatially-coherent lighting. Additionally, a differentiable voxel octree cone tracing rendering layer is proposed to eliminate regular volumetric representation throughout the entire process and ensure the retention of features across different frequency domains. This reduction significantly decreases spatial usage and required floating-point operations without substantially compromising precision. Experimental results demonstrate that our approach achieves high-quality coherent estimation with minimal cost compared to previous methods.

LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation

TL;DR

This work tackles indoor lighting estimation from a single RGB image with 3D spatial coherence, addressing the high memory and compute costs of dense volumetric methods. It introduces a sparse voxel octree lighting representation paired with a lightweight octree-based network and a differentiable voxel-octree cone-tracing renderer to produce high-quality, spatially coherent illumination for virtual object insertion. The approach reduces storage and computation to approximately versus for dense grids, while supporting end-to-end training and realistic rendering. Experiments on synthetic indoor datasets demonstrate competitive illumination accuracy and superior efficiency, enabling interactive AR/MR applications with photorealistic virtual insertions and robust cross-view consistency.

Abstract

We present a lightweight solution for estimating spatially-coherent indoor lighting from a single RGB image. Previous methods for estimating illumination using volumetric representations have overlooked the sparse distribution of light sources in space, necessitating substantial memory and computational resources for achieving high-quality results. We introduce a unified, voxel octree-based illumination estimation framework to produce 3D spatially-coherent lighting. Additionally, a differentiable voxel octree cone tracing rendering layer is proposed to eliminate regular volumetric representation throughout the entire process and ensure the retention of features across different frequency domains. This reduction significantly decreases spatial usage and required floating-point operations without substantially compromising precision. Experimental results demonstrate that our approach achieves high-quality coherent estimation with minimal cost compared to previous methods.
Paper Structure (30 sections, 7 equations, 6 figures, 3 tables)

This paper contains 30 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The comparison with state-of-the-art (SOTA) methods shows that our approach achieves relatively good accuracy while requiring lower storage and computational costs.
  • Figure 2: Overall structure of our framework. The process starts with the Input Process stage, where the direct prediction module estimates depth and extracts global illumination features from the input image. The depth values and features are combined to create a point cloud, which is used to build a 3D voxel octree scene representation. Next, the Lighting Estimation stage uses a U-Net structure to predict the lighting from the constructed octree. The Object Insertion stage then combines the original RGB information, predicted depth values, predicted lighting voxel octree, and user-specified mesh data to render an image with consistent lighting for the inserted virtuals object.
  • Figure 3: Light Network. A lightweight U-Net architecture constructed with graph O-CNN operators. These modules work in tandem to forecast octree subdivisions, field values, and establish correspondence with ground truth through the Rendering Layer.To provide a clearer illustration of the structure, here presents a network with a full depth of 4 and depth of 6, while the actual depth is 7 (equivalent to voxel resolution of $128^3$).
  • Figure 4: Rendering layer.
  • Figure 5: Qualitative evaluation on virtual object insertion.
  • ...and 1 more figures