Table of Contents
Fetching ...

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Johannes Kopf, Shenlong Wang, Changil Kim

TL;DR

This work tackles inverse rendering for indoor scenes using only multi-view LDR images by explicitly modeling the HDR-to-LDR camera response and tone-mapping. The proposed IRIS framework recovers spatially varying HDR lighting, Cook–Torrance BRDFs via a neural field, and a learnable CRF, through a staged optimization that initializes BRDF, restores HDR emission, bakes shading, and jointly refines materials and CRF. It demonstrates superior performance over HDR-dependent baselines and LDR-only methods on real and synthetic data, enabling photorealistic relighting and object insertion with realistic reflections and shadows. The practical impact lies in making high-quality inverse rendering accessible with casual capture workflows and standard devices.

Abstract

Inverse rendering seeks to recover 3D geometry, surface material, and lighting from captured images, enabling advanced applications such as novel-view synthesis, relighting, and virtual object insertion. However, most existing techniques rely on high dynamic range (HDR) images as input, limiting accessibility for general users. In response, we introduce IRIS, an inverse rendering framework that recovers the physically based material, spatially-varying HDR lighting, and camera response functions from multi-view, low-dynamic-range (LDR) images. By eliminating the dependence on HDR input, we make inverse rendering technology more accessible. We evaluate our approach on real-world and synthetic scenes and compare it with state-of-the-art methods. Our results show that IRIS effectively recovers HDR lighting, accurate material, and plausible camera response functions, supporting photorealistic relighting and object insertion.

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

TL;DR

This work tackles inverse rendering for indoor scenes using only multi-view LDR images by explicitly modeling the HDR-to-LDR camera response and tone-mapping. The proposed IRIS framework recovers spatially varying HDR lighting, Cook–Torrance BRDFs via a neural field, and a learnable CRF, through a staged optimization that initializes BRDF, restores HDR emission, bakes shading, and jointly refines materials and CRF. It demonstrates superior performance over HDR-dependent baselines and LDR-only methods on real and synthetic data, enabling photorealistic relighting and object insertion with realistic reflections and shadows. The practical impact lies in making high-quality inverse rendering accessible with casual capture workflows and standard devices.

Abstract

Inverse rendering seeks to recover 3D geometry, surface material, and lighting from captured images, enabling advanced applications such as novel-view synthesis, relighting, and virtual object insertion. However, most existing techniques rely on high dynamic range (HDR) images as input, limiting accessibility for general users. In response, we introduce IRIS, an inverse rendering framework that recovers the physically based material, spatially-varying HDR lighting, and camera response functions from multi-view, low-dynamic-range (LDR) images. By eliminating the dependence on HDR input, we make inverse rendering technology more accessible. We evaluate our approach on real-world and synthetic scenes and compare it with state-of-the-art methods. Our results show that IRIS effectively recovers HDR lighting, accurate material, and plausible camera response functions, supporting photorealistic relighting and object insertion.
Paper Structure (27 sections, 13 equations, 19 figures, 7 tables)

This paper contains 27 sections, 13 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Physically-based inverse rendering from LDR images. Our method takes LDR input images and estimates high-quality spatially-varying HDR lighting, physically-based material properties, and a camera response function (CRF). This decomposition allows us to insert objects or new light sources, and to relight the scene in a physically accurate manner.
  • Figure 2: Limitation of SOTA. A typical image formation process causes the loss of lighting information, posing challenges in inverse rendering. FIPT wu2023factorized assumes HDR input and NeILF yao2022neilf ignores multi-bounce light transport. Both methods fail to estimate accurate material (red boxes) and HDR lighting (yellow boxes). We demonstrate significantly better results (\ref{['fig:teaser']}).
  • Figure 3: Framework Overview. Given multi-view posed LDR images and a surface mesh, our inverse rendering pipeline is divided into two main stages. In the initialization stage, we initialize the BRDF (\ref{['sec:brdf_init']}), extract a surface light field (\ref{['sec:representation']}), and estimate emitter geometry (\ref{['eq:emitter_mask']}). In the optimization stage, we first recover HDR radiance from the LDR input (\ref{['sec:hdr_restoration']}), then bake shading maps (\ref{['sec:shading']}), and jointly optimize BRDF and CRF parameters (\ref{['sec:train_brdf']}). The improved parameters are used to refine the emission again. These three steps are repeated until convergence.
  • Figure 4: Emitter geometry estimation $\boldsymbol M_\text{e}(\mathbf{x})$. The point $\mathbf{x}_1$ on the window is saturated across all input views and thus identified as an emitter. The point $\mathbf{x}_2$ on the table is reflective and saturated in some views (e.g., view 2) but not in others and is NOT an emitter.
  • Figure 5: HDR emission restoration.Top: Ray sampling process. Learnable direct lighting $\mathbf{L}_\text{e}$ is retrieved if a ray hits an emitter (e.g., window) or $\mathbf{L}_{\text{SLF}}$ is retrieved otherwise (e.g., wall). Bottom: HDR restoration process. By performing differentiable physically-based rendering, the photometric loss enhances the emitter intensity $\mathbf{L}_\text{e}$.
  • ...and 14 more figures