Table of Contents
Fetching ...

CRAYM: Neural Field Optimization via Camera RAY Matching

Liqiang Lin, Wenpeng Wu, Chi-Wing Fu, Hao Zhang, Hui Huang

TL;DR

CRAYM tackles the challenge of noisy camera poses in multi-view neural field reconstruction by introducing camera ray matching that operates on a learnable feature volume $\,\mathcal{V}$. It couples two novel modules—Key Rays Enrichment (KRE) and Matched Rays Coherency (MRC)—with epipolar and point-alignment geometric losses to enforce multi-view coherence, while enabling end-to-end optimization for both novel view synthesis and 3D reconstruction. The method shows improved pose alignment and higher quality renderings and meshes on NeRF-Synthetic and UrbanScene3D across dense and sparse views, and demonstrates robustness to pose noise. These findings suggest CRAYM as a practical approach for accurate, photorealistic 3D reconstruction in realistic, noisy capture scenarios, with potential extensions to more scalable representations like 3D Gaussian splatting.

Abstract

We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We formulate our per-ray optimization and matched ray coherence by focusing on camera rays passing through keypoints in the input images to elevate both the efficiency and accuracy of scene correspondences. Accumulated ray features along the feature volume provide a means to discount the coherence constraint amid erroneous ray matching. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.

CRAYM: Neural Field Optimization via Camera RAY Matching

TL;DR

CRAYM tackles the challenge of noisy camera poses in multi-view neural field reconstruction by introducing camera ray matching that operates on a learnable feature volume . It couples two novel modules—Key Rays Enrichment (KRE) and Matched Rays Coherency (MRC)—with epipolar and point-alignment geometric losses to enforce multi-view coherence, while enabling end-to-end optimization for both novel view synthesis and 3D reconstruction. The method shows improved pose alignment and higher quality renderings and meshes on NeRF-Synthetic and UrbanScene3D across dense and sparse views, and demonstrates robustness to pose noise. These findings suggest CRAYM as a practical approach for accurate, photorealistic 3D reconstruction in realistic, noisy capture scenarios, with potential extensions to more scalable representations like 3D Gaussian splatting.

Abstract

We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We formulate our per-ray optimization and matched ray coherence by focusing on camera rays passing through keypoints in the input images to elevate both the efficiency and accuracy of scene correspondences. Accumulated ray features along the feature volume provide a means to discount the coherence constraint amid erroneous ray matching. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.

Paper Structure

This paper contains 27 sections, 11 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Our method, neural field optimization with camera ray matching (CRAYM), incorporates contextual information for per-ray processing and enforces color + geometric consistence between matched rays. Compared to SPARF truong2023sparf which utilizes dense pixel correspondences and the state-of-the-art, bundle-adjusting L2G-NeRF l2g, both aimed at handling noisy camera poses, CRAYM produces superior results especially over fine details; see the zoom-ins on the right. Results are shown the Drums model from NeRF-Synthetic nerf on dense views.
  • Figure 2: Overview of our CRAYM pipeline. After extracting keypoints (red dots) from input images and matching them using a pre-trained network, we train our CRAYM network to optimize a 3D feature volume $\mathcal{V}$ which encodes both geometric and photometric information about the target 3D object and can be queried by camera rays for both novel view synthesis (via the Texture Network) and 3D reconstruction (via the Geometry Network). The volume optimization is subject to photometric losses through rendering along camera rays passing through the keypoints (i.e., the key rays), which is enhanced (in the KRE) by integrating features from auxiliary rays, i.e., rays passing through nearby auxiliary points (yellow dots) in the images. Matched ray coherence (MRC) is imposed on matched key rays, in terms of color consistency, while potentially mismatched rays can be identified by comparing accumulated features along the key rays through $\mathcal{V}$. On top of the standard photometric loss, we introduce two geometric losses, the epipolar loss and point-alignment loss, to explicitly optimize ray-to-ray coherency to maximize the reconstruction quality of the feature volume.
  • Figure 3: Illustrating of our geometric losses. The red lines in the left subfigure are epipolar lines. The epipolar loss constrains the relative transformations between cameras, so that the projection of a keypoint $p_k$ onto the image plane of the other camera should lie on the epipolar line $e'x'_k$. With the camera poses constrained by the epipolar loss, the point-alignment loss further constrains the depth of $x_k$ and $x_k'$, aiming to align $p_k$ and $p_k'$ with $P$.
  • Figure 4: Visualization of the initial and optimized camera poses for the LEGO scene in the NeRF-Synthetic dataset nerf. (Purple: ground-truth poses; blue: initial or optimized poses; red lines: translation errors.)
  • Figure 5: Qualitative comparison results of novel view synthesis and surfaces reconstruction on the synthetic objects.
  • ...and 6 more figures