Table of Contents
Fetching ...

NeAR: Coupled Neural Asset-Renderer Stack

Hong Li, Chongjie Ye, Houyuan Chen, Weiqing Xiao, Ziyang Yan, Lixing Xiao, Zhaoxi Chen, Jianfeng Xiang, Shaocong Xu, Xuhui Liu, Yikai Wang, Baochang Zhang, Xiaoguang Han, Jiaolong Yang, Hao Zhao

TL;DR

NeAR addresses the decoupled nature of neural asset authoring and rendering by introducing a Lighting-Homogenized SLAT (LH-SLAT) that neutralizes lighting before relighting. A two-stage pipeline then decodes a relightable 3D Gaussian Splat (3DGS) with a lighting-aware renderer, enabling real-time, multi-view, relightable 3D generation. The approach achieves state-of-the-art or improved results across forward rendering, unknown-lit relighting, and novel-view relighting, while generalizing to unseen objects and out-of-domain datasets. This work advocates co-design of neural assets and renderers as a robust graphics stack for better fidelity and relighting control.

Abstract

Neural asset authoring and neural rendering have traditionally evolved as disjoint paradigms: one generates digital assets for fixed graphics pipelines, while the other maps conventional assets to images. However, treating them as independent entities limits the potential for end-to-end optimization in fidelity and consistency. In this paper, we bridge this gap with NeAR, a Coupled Neural Asset--Renderer Stack. We argue that co-designing the asset representation and the renderer creates a robust "contract" for superior generation. On the asset side, we introduce the Lighting-Homogenized SLAT (LH-SLAT). Leveraging a rectified-flow model, NeAR lifts casually lit single images into a canonical, illumination-invariant latent space, effectively suppressing baked-in shadows and highlights. On the renderer side, we design a lighting-aware neural decoder tailored to interpret these homogenized latents. Conditioned on HDR environment maps and camera views, it synthesizes relightable 3D Gaussian splats in real-time without per-object optimization. We validate NeAR on four tasks: (1) G-buffer-based forward rendering, (2) random-lit reconstruction, (3) unknown-lit relighting, and (4) novel-view relighting. Extensive experiments demonstrate that our coupled stack outperforms state-of-the-art baselines in both quantitative metrics and perceptual quality. We hope this coupled asset-renderer perspective inspires future graphics stacks that view neural assets and renderers as co-designed components instead of independent entities.

NeAR: Coupled Neural Asset-Renderer Stack

TL;DR

NeAR addresses the decoupled nature of neural asset authoring and rendering by introducing a Lighting-Homogenized SLAT (LH-SLAT) that neutralizes lighting before relighting. A two-stage pipeline then decodes a relightable 3D Gaussian Splat (3DGS) with a lighting-aware renderer, enabling real-time, multi-view, relightable 3D generation. The approach achieves state-of-the-art or improved results across forward rendering, unknown-lit relighting, and novel-view relighting, while generalizing to unseen objects and out-of-domain datasets. This work advocates co-design of neural assets and renderers as a robust graphics stack for better fidelity and relighting control.

Abstract

Neural asset authoring and neural rendering have traditionally evolved as disjoint paradigms: one generates digital assets for fixed graphics pipelines, while the other maps conventional assets to images. However, treating them as independent entities limits the potential for end-to-end optimization in fidelity and consistency. In this paper, we bridge this gap with NeAR, a Coupled Neural Asset--Renderer Stack. We argue that co-designing the asset representation and the renderer creates a robust "contract" for superior generation. On the asset side, we introduce the Lighting-Homogenized SLAT (LH-SLAT). Leveraging a rectified-flow model, NeAR lifts casually lit single images into a canonical, illumination-invariant latent space, effectively suppressing baked-in shadows and highlights. On the renderer side, we design a lighting-aware neural decoder tailored to interpret these homogenized latents. Conditioned on HDR environment maps and camera views, it synthesizes relightable 3D Gaussian splats in real-time without per-object optimization. We validate NeAR on four tasks: (1) G-buffer-based forward rendering, (2) random-lit reconstruction, (3) unknown-lit relighting, and (4) novel-view relighting. Extensive experiments demonstrate that our coupled stack outperforms state-of-the-art baselines in both quantitative metrics and perceptual quality. We hope this coupled asset-renderer perspective inspires future graphics stacks that view neural assets and renderers as co-designed components instead of independent entities.

Paper Structure

This paper contains 33 sections, 11 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Comparison of NeAR and Decoupled Paradigms. Left: Visual results under target illumination. Cols. 3–5 are rendered via Blender to evaluate asset quality. Insets (right of cols. 4&5) display PBR maps (top-down: Base Color, Metallic, Roughness). Baselines suffer from baked-in lighting (Trellis) or material ambiguity (HY3D-2.1). Notably, HY3D-2.1 wrongly assigns high metallic values to the bread (see Metallic map, Row 1) and exhibits inconsistent highlights on the robot (Row 3). While our intermediate PBR decomposition (col. 5) corrects materials, it struggles with complex effects like transparency (Helmet, Row 2) under standard rendering. Our full Neural Renderer (col. 6) resolves this, yielding photorealistic results closest to GT. Right: Quantitative results on the Glossy Synthetic dataset. NeAR achieves the highest PSNR across all four tasks, demonstrating the superiority of our coupled stack.
  • Figure 2: Lighting homogenization as the bridge between assets and renderer. We visualize the intrinsic components (Base Color, Ambient Occlusion), rendering results under random and uniform lighting, shadow maps, as well as relighting outputs generated respectively by Shaded SLAT and LH-SLAT. By mapping casually lit images to a canonical illumination space, LH-SLAT effectively suppresses baked-in shadows and unstable specularities while preserving geometry-consistent diffuse cues. This stable latent space serves as the robust "contract" for our lighting-aware neural renderer to enable controllable relighting.
  • Figure 3: Overview of NeAR vs. Existing Frameworks. (a-b) Existing 2D methods lack explicit 3D awareness; specifically, (a) struggles to disentangle specular highlights, while both fail to guarantee multi-view consistency during relighting. (c) State-of-the-art 3D generation methods decouple asset authoring from rendering, relying on ill-posed PBR decomposition that often results in material inaccuracies and baked-in artifacts. In contrast, (d) NeAR (Ours) employs a Coupled Neural Asset--Renderer Stack. By utilizing the LH-SLAT representation, we simultaneously achieve photorealistic relighting and consistent novel-view synthesis.
  • Figure 4: Pipeline of NeAR as a coupled neural asset-renderer stack.Top (Inference Stage): An end-to-end inference pipeline. Given a single image and a geometry prior (e.g., mesh from HY3D), Stage 1 utilizes a rectified-flow backbone with LoRA adaptation to predict the Lighting-Homogenized SLAT (LH-SLAT). This latent acts as a bridge, which is then consumed by the Stage 2 lighting-aware neural renderer to synthesize relightable 3DGS under novel illumination and viewpoints. Bottom-Left (Data Prep): Offline construction of ground-truth LH-SLATs by rendering assets under homogenized illumination and encoding them via a sparse VAE. Bottom-Right (GS Decoding & Rendering): Detailed architecture of the 3DGS decoding head, which predicts Gaussian attributes from lighting-dependent features, followed by a differentiable rasterizer $\mathcal{M}$ that renders the final HDR image, shadow and PBR auxiliary maps.
  • Figure 5: The network structures for Lighting Tokenizer, IAD and LAD.
  • ...and 14 more figures