Table of Contents
Fetching ...

Neural Radiance Fields with Torch Units

Bingnan Ni, Huanyu Wang, Dongfeng Bai, Minghe Weng, Dexin Qi, Weichao Qiu, Bingbing Liu

TL;DR

This work tackles the challenge of NeRF-based reconstruction in complex, large-scale scenes where conventional per-pixel ray querying suffers from weak contextuality and background variance. It introduces Torch-NeRF, which renders a patch of pixels per camera ray and employs distance-aware convolutions along rays to model interactions among sample points, paired with a coarse/fine training scheme that enhances efficiency. The approach achieves significant improvements over baselines on KITTI-360 and LLFF in PSNR, SSIM, and LPIPS, without requiring semantic priors, demonstrating stronger structure preservation and reduced noise in challenging outdoor environments. Overall, Torch-NeRF advances scalable neural radiance field reconstruction by expanding the ray perception field and enabling contextualized, patch-based rendering.

Abstract

Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current inference pattern, $i.e.$, a pixel only relies on an individual camera ray, fails to capture contextual information. To solve these problems, we propose to enlarge the ray perception field and build up the sample points interactions. In this paper, we design a novel inference pattern that encourages a single camera ray possessing more contextual information, and models the relationship among sample points on each camera ray. To hold contextual information,a camera ray in our proposed method can render a patch of pixels simultaneously. Moreover, we replace the MLP in neural radiance field models with distance-aware convolutions to enhance the feature propagation among sample points from the same camera ray. To summarize, as a torchlight, a ray in our proposed method achieves rendering a patch of image. Thus, we call the proposed method, Torch-NeRF. Extensive experiments on KITTI-360 and LLFF show that the Torch-NeRF exhibits excellent performance.

Neural Radiance Fields with Torch Units

TL;DR

This work tackles the challenge of NeRF-based reconstruction in complex, large-scale scenes where conventional per-pixel ray querying suffers from weak contextuality and background variance. It introduces Torch-NeRF, which renders a patch of pixels per camera ray and employs distance-aware convolutions along rays to model interactions among sample points, paired with a coarse/fine training scheme that enhances efficiency. The approach achieves significant improvements over baselines on KITTI-360 and LLFF in PSNR, SSIM, and LPIPS, without requiring semantic priors, demonstrating stronger structure preservation and reduced noise in challenging outdoor environments. Overall, Torch-NeRF advances scalable neural radiance field reconstruction by expanding the ray perception field and enabling contextualized, patch-based rendering.

Abstract

Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current inference pattern, , a pixel only relies on an individual camera ray, fails to capture contextual information. To solve these problems, we propose to enlarge the ray perception field and build up the sample points interactions. In this paper, we design a novel inference pattern that encourages a single camera ray possessing more contextual information, and models the relationship among sample points on each camera ray. To hold contextual information,a camera ray in our proposed method can render a patch of pixels simultaneously. Moreover, we replace the MLP in neural radiance field models with distance-aware convolutions to enhance the feature propagation among sample points from the same camera ray. To summarize, as a torchlight, a ray in our proposed method achieves rendering a patch of image. Thus, we call the proposed method, Torch-NeRF. Extensive experiments on KITTI-360 and LLFF show that the Torch-NeRF exhibits excellent performance.
Paper Structure (30 sections, 8 equations, 5 figures, 5 tables)

This paper contains 30 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of Ray Perception Field. (a) An individual camera ray in NeRF is only related to a single pixel. The camera ray $r$ goes through a pixel, thus, its perception field covers a single point. (b) Similarly, a camera ray in Mip-NeRF is also corresponding to a single pixel. They utilize a conical ray to go through a circular area of a pixel. Thus, the perception field of such a ray is $\pi \cdot s^2$ , where s is a fixed value about half width of a pixel. (c) Different from the camera ray in prevalent methods which generates a single pixel, that in our method is able to render a patch of images of $p \times p$. Therefore, the ray perception field is $p \times p$ in our torch-NeRF.
  • Figure 2: The whole framework of the proposed method. Similar to NeRF, the input coordinates compose of a 3D position location and a 2D view direction. First, we input uniform sampled points along each ray to a coarse model. The coarse model only outputs the density of each sample point. Second, we re-sample points according to the output density of the coarse model. Next, the re-sampled points are input into a fine model. Finally, we render a patch of pixels $p \times p$ with the input ray. It is worth noting that the coarse model is training free.
  • Figure 3: Illustration of Distance aware Convolution Operation along Rays. We conduct a weighted 1D convolution operation along ray with a sliding window over several sample points.
  • Figure 4: Illustration of the proposed network structure. (a) The vanilla Neural Radiance Field methodmildenhall2021nerf trains a coarse model and a fine model independently. (b) The Mip-NeRFbarron2021mip only uses a single model and backward it twice. (c) In our method, we only train the fine model and update the coarse model with the updating of the fine model.
  • Figure 5: Visualization of synthesized images and error images. We compare with the other three neural radiance-based methods by referring to the project page of KITTI-360. The error images are generated according to (0.5 - SSIM/2). Thus, the bright region means a large error and the dark one means the low error. To make the contrast more obvious, we zoom-in on three patches of images from each synthesized one.