Table of Contents
Fetching ...

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Tianyuan Zhang, Zhengfei Kuang, Haian Jin, Zexiang Xu, Sai Bi, Hao Tan, He Zhang, Yiwei Hu, Milos Hasan, William T. Freeman, Kai Zhang, Fujun Luan

TL;DR

RelitLRM tackles relightable 3D reconstruction from sparse, uncontrolled imagery by integrating a deterministic geometry regressor with a diffusion-based relighting module in a Large Reconstruction Model. The system uses 3D Gaussian Splatting (3DGS) representations and a relit-view diffusion conditioned on target illumination to produce multi-modal, photo-realistic radiance under novel lighting and viewpoints. Trained on a large, diverse synthetic dataset with HDR environment maps, RelitLRM achieves state-of-the-art or competitive relighting performance with far fewer input views (4–8) and orders-of-magnitude faster inference (2–3 seconds) than optimization-based baselines. The approach enables practical relightable 3D assets for AR/VR, gaming, and content creation, while highlighting limitations in camera parameter requirements and near-field lighting modeling.

Abstract

We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

TL;DR

RelitLRM tackles relightable 3D reconstruction from sparse, uncontrolled imagery by integrating a deterministic geometry regressor with a diffusion-based relighting module in a Large Reconstruction Model. The system uses 3D Gaussian Splatting (3DGS) representations and a relit-view diffusion conditioned on target illumination to produce multi-modal, photo-realistic radiance under novel lighting and viewpoints. Trained on a large, diverse synthetic dataset with HDR environment maps, RelitLRM achieves state-of-the-art or competitive relighting performance with far fewer input views (4–8) and orders-of-magnitude faster inference (2–3 seconds) than optimization-based baselines. The approach enables practical relightable 3D assets for AR/VR, gaming, and content creation, while highlighting limitations in camera parameter requirements and near-field lighting modeling.

Abstract

We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.
Paper Structure (35 sections, 5 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 35 sections, 5 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Demonstration of RelitLRM's relighting capabilities.(a) Sparse posed images captured under unknown lighting conditions serve as the input for our method. (b)RelitLRM accurately reconstructs and relight the 3D object in the form of 3DGS under novel target lightings (either outdoor or indoor), with renderings closely matching the ground truth. (c) Object insertion into a virtual 3D scene. The red arrows indicate the objects relighted by RelitLRM using the scene illumination, demonstrating its ability to capture surrounding illumination and seamlessly harmonize with the scene background. (d) Objects relighting results showcase the robustness of our method under challenging lighting conditions, such as strong directional lighting. Our method faithfully models complex lighting effects, including removing shadow and highlights in input images and also casting strong shadows and glossy specular highlights under target lighting.
  • Figure 2: Overview of RelitLRM for sparse-view relightable 3D reconstruction. Our pipeline consists of a geometry regressor and a relit appearance generator, both implemented as transformer blocks and trained jointly end-to-end. We implicitly bake the relit appearance generation in relit-view diffusion process. During inference, we first extract geometry tokens from sparse input images and regress geometry parameters for per-pixel 3D Gaussians (3DGS). Conditioning on novel target lighting and extracted geometry features, we denoise the relit views by first predicting a 3DGS appearance, then render it (along with 3DGS geometry that stays fixed in the diffusion denoising loop) into the denoising viewpoints. This iterative process produces the relit 3DGS radiance as a byproduct while denoising the relit views. The generative appearance and deterministic geometry blocks are trained end-to-end, ensuring scalability.
  • Figure 3: Comparison with optimization-based inverse rendering baselines on Stanford-ORBkuang2024stanford, Objects-with-Lightingummenhofer2024objects, and TensoIR-Syntheticjin2023tensoir datasets. (a) On Stanford-ORB, our method captures realistic specular highlights and geometric details, outperforming NVDiffRec-MChasselgren2022shape and InvRender zhang2022modeling. (b) On Objects-with-Lighting, our results closely match the ground truth, while baselines show over-specularity or artifacts. (c) For TensoIR-Synthetic, our method achieves comparable relighting with significantly fewer views. Notably, our method requires only 6 to 8 views and completes relighting in 2-3 seconds, while baselines need over 50 views and hours of processing.
  • Figure 4: Comparison with image-based relighting baseline on our held-out evaluation set shows that our model produces better visual quality, with improved shadow removal and highlight. Our model processes four input images jointly, while the baseline relights each image independently.
  • Figure 5: Our probabilistic design yields significantly better results on specular highlights compared to the deterministic counterpart. The radiance function of specular objects under challenging lighting is highly multi-modal with long tails. Our denoising diffusion approach models this distribution more effectively, while the deterministic design fails to mode such complex distribution and produce overly smooth specular highlights.
  • ...and 1 more figures