Table of Contents
Fetching ...

Object-Centric Neural Scene Rendering

Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

TL;DR

This work tackles photorealistic rendering of dynamic scenes where objects and lighting move, a scenario where NeRF-like radiance fields struggle due to static illumination assumptions. It introduces object-centric neural scattering functions (OSFs), a per-object neural network $F_\Theta$ that maps $(\mathbf{x}, \boldsymbol{\omega_l}, \boldsymbol{\omega_o})$ to $(\sigma, \boldsymbol{\rho})$, enabling reuse of object assets across different scene configurations. OSFs are integrated with volumetric path tracing to model inter-object light transport, including shadows and indirect illumination, without retraining when scene arrangements change. Experiments on Furniture datasets show that OSFs better disentangle lighting from viewpoint, reproduce shadows and specularities, and render complex lighting scenarios more accurately than NeRF-based baselines, highlighting the practical potential of combining implicit object models with classical rendering techniques.

Abstract

We present a method for composing photorealistic scenes from captured images of objects. Our work builds upon neural radiance fields (NeRFs), which implicitly model the volumetric density and directionally-emitted radiance of a scene. While NeRFs synthesize realistic pictures, they only model static scenes and are closely tied to specific imaging conditions. This property makes NeRFs hard to generalize to new scenarios, including new lighting or new arrangements of objects. Instead of learning a scene radiance field as a NeRF does, we propose to learn object-centric neural scattering functions (OSFs), a representation that models per-object light transport implicitly using a lighting- and view-dependent neural network. This enables rendering scenes even when objects or lights move, without retraining. Combined with a volumetric path tracing procedure, our framework is capable of rendering both intra- and inter-object light transport effects including occlusions, specularities, shadows, and indirect illumination. We evaluate our approach on scene composition and show that it generalizes to novel illumination conditions, producing photorealistic, physically accurate renderings of multi-object scenes.

Object-Centric Neural Scene Rendering

TL;DR

This work tackles photorealistic rendering of dynamic scenes where objects and lighting move, a scenario where NeRF-like radiance fields struggle due to static illumination assumptions. It introduces object-centric neural scattering functions (OSFs), a per-object neural network that maps to , enabling reuse of object assets across different scene configurations. OSFs are integrated with volumetric path tracing to model inter-object light transport, including shadows and indirect illumination, without retraining when scene arrangements change. Experiments on Furniture datasets show that OSFs better disentangle lighting from viewpoint, reproduce shadows and specularities, and render complex lighting scenarios more accurately than NeRF-based baselines, highlighting the practical potential of combining implicit object models with classical rendering techniques.

Abstract

We present a method for composing photorealistic scenes from captured images of objects. Our work builds upon neural radiance fields (NeRFs), which implicitly model the volumetric density and directionally-emitted radiance of a scene. While NeRFs synthesize realistic pictures, they only model static scenes and are closely tied to specific imaging conditions. This property makes NeRFs hard to generalize to new scenarios, including new lighting or new arrangements of objects. Instead of learning a scene radiance field as a NeRF does, we propose to learn object-centric neural scattering functions (OSFs), a representation that models per-object light transport implicitly using a lighting- and view-dependent neural network. This enables rendering scenes even when objects or lights move, without retraining. Combined with a volumetric path tracing procedure, our framework is capable of rendering both intra- and inter-object light transport effects including occlusions, specularities, shadows, and indirect illumination. We evaluate our approach on scene composition and show that it generalizes to novel illumination conditions, producing photorealistic, physically accurate renderings of multi-object scenes.

Paper Structure

This paper contains 22 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: We propose an object-centric neural scene representation for image synthesis. Given a scene description (a), and a repository of neural object-centric scattering functions (OSF) trained independently and frozen for each object (b), we can compose the objects into scenes (c), and render photorealistic images as we move lights (d), cameras (e), and/or objects (f).
  • Figure 2: (a) Result of learning NeRF on a dynamic scene. NeRF assumes static scenes and fixed lighting, producing blurred predictions when trained on dynamic scenes where objects and lighting are randomly moved. (b) In contrast, our method is able to predict crisp images.
  • Figure 3: We represent each object as an object-centric neural scattering function (OSF), which models how light entering at a point $\bm{x}$ on the object, from direction $\bm{\omega_l}$ where $\bm{l}$ corresponds to a light path, undergoes multiple bounces within the object and exits along direction $\bm{\omega_o}$ with some fractional amount of light $\bm{\rho}$. We approximate the scattering function with a multilayer perceptron $F_\Theta$ where $\Theta$ are learned weights that parameterize the neural network. Given a single point $\bm{x}$, an incoming light direction $\bm{\omega_l}$, and an outgoing direction $\bm{\omega_o}$, $F_\Theta$ outputs the volume density $\sigma$ of that point, as well as the fraction of light arriving at $\bm{x}$ from direction $\bm{\omega_l}$ that is scattered in direction $\bm{\omega_o}$.
  • Figure 4: Rendering multiple object-centric neural scattering functions (OSFs). We propose a rendering procedure for rendering an arbitrary scene description consisting of objects, light sources, and camera information. Given a set of objects, we compute direct illumination by shooting rays from each light source to each object (brown arrows). Shadows are computed by sending shadow rays back to each light source (purple arrow). In this scenario, the shadow ray from the green cylinder is occluded by the red ball, so the red ball casts a shadow on the green cylinder. We also send secondary rays between objects to render indirect illumination effects, such as between the green and blue objects in the illustration (green and blue dashed arrows). Finally, rays are sent back to the camera to render the final image (dark blue arrows).
  • Figure 5: OSF sampling procedure. Given a scene with a camera, light source, and object bounding boxes, we first send camera rays from the camera into the scene (a). Rays that do not intersect with objects are pruned. Of the rays that do intersect, we compute ray-box intersections and sample points within intersecting regions. To compute shadows, we send shadow rays from each sample to the light source, and evaluate samples within intersecting regions along the shadow rays (b). A similar sampling procedure is used for evaluating secondary rays for indirect illumination.
  • ...and 5 more figures