Table of Contents
Fetching ...

RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

Muyao Niu, Tong Chen, Yifan Zhan, Zhuoxiao Li, Xiang Ji, Yinqiang Zheng

TL;DR

RS-NeRF tackles the challenge of Rolling Shutter distortions in neural radiance fields by explicitly modeling RS image formation and jointly optimizing per-row camera poses with NeRF parameters. It introduces two key refinements: trajectory smoothness regularization to enforce physically plausible camera motion, and a multi-sampling algorithm that expands training data across multiple intermediate poses via the PP-ratio, enabling better use of RS data. The method demonstrates consistent improvements over prior RS correction and NeRF baselines on both synthetic and real RS datasets, including clear gains in PSNR, SSIM, and perceptual metrics. This framework provides a practical path to accurate 3D reconstruction and novel-view synthesis from RS-equipped cameras, with publicly available code and data to support replication and extension.

Abstract

Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical model that replicates the image formation process under RS conditions and jointly optimizes NeRF parameters and camera extrinsic for each image row. We further address the inherent shortcomings of the basic RS-NeRF model by delving into the RS characteristics and developing algorithms to enhance its functionality. First, we impose a smoothness regularization to better estimate trajectories and improve the synthesis quality, in line with the camera movement prior. We also identify and address a fundamental flaw in the vanilla RS model by introducing a multi-sampling algorithm. This new approach improves the model's performance by comprehensively exploiting the RGB data across different rows for each intermediate camera pose. Through rigorous experimentation, we demonstrate that RS-NeRF surpasses previous methods in both synthetic and real-world scenarios, proving its ability to correct RS-related distortions effectively. Codes and data available: https://github.com/MyNiuuu/RS-NeRF

RS-NeRF: Neural Radiance Fields from Rolling Shutter Images

TL;DR

RS-NeRF tackles the challenge of Rolling Shutter distortions in neural radiance fields by explicitly modeling RS image formation and jointly optimizing per-row camera poses with NeRF parameters. It introduces two key refinements: trajectory smoothness regularization to enforce physically plausible camera motion, and a multi-sampling algorithm that expands training data across multiple intermediate poses via the PP-ratio, enabling better use of RS data. The method demonstrates consistent improvements over prior RS correction and NeRF baselines on both synthetic and real RS datasets, including clear gains in PSNR, SSIM, and perceptual metrics. This framework provides a practical path to accurate 3D reconstruction and novel-view synthesis from RS-equipped cameras, with publicly available code and data to support replication and extension.

Abstract

Neural Radiance Fields (NeRFs) have become increasingly popular because of their impressive ability for novel view synthesis. However, their effectiveness is hindered by the Rolling Shutter (RS) effects commonly found in most camera systems. To solve this, we present RS-NeRF, a method designed to synthesize normal images from novel views using input with RS distortions. This involves a physical model that replicates the image formation process under RS conditions and jointly optimizes NeRF parameters and camera extrinsic for each image row. We further address the inherent shortcomings of the basic RS-NeRF model by delving into the RS characteristics and developing algorithms to enhance its functionality. First, we impose a smoothness regularization to better estimate trajectories and improve the synthesis quality, in line with the camera movement prior. We also identify and address a fundamental flaw in the vanilla RS model by introducing a multi-sampling algorithm. This new approach improves the model's performance by comprehensively exploiting the RGB data across different rows for each intermediate camera pose. Through rigorous experimentation, we demonstrate that RS-NeRF surpasses previous methods in both synthetic and real-world scenarios, proving its ability to correct RS-related distortions effectively. Codes and data available: https://github.com/MyNiuuu/RS-NeRF
Paper Structure (13 sections, 8 equations, 11 figures, 4 tables)

This paper contains 13 sections, 8 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Pipeline of vanilla RS-NeRF. Given a series of images with RS distortion, RS-NeRF learns the underlying normal 3D representations by jointly estimating pose and RGB values for each row. For each RS image, we first sample the rows and obtain their camera poses via the pose estimation network. We then sample points for each row and feed them with the estimated camera pose to the NeRF network and optimize the photo-metric loss with ground truths.
  • Figure 2: Smoothness regularization on camera trajectory. For each pair of adjacent trajectory vectors $\overrightarrow{\mathbf{d}}_{k}$ and $\overrightarrow{\mathbf{d}}_{k+1}$, we compute the mid-point trajectory vector $\overrightarrow{\mathbf{d}}_{mid}$ and their unit vectors $\overrightarrow{\mathbf{n}}_{k}$, $\overrightarrow{\mathbf{n}}_{k+1}$, and $\overrightarrow{\mathbf{n}}_{mid}$. We then apply the $L_2$ regularization between $\overrightarrow{\mathbf{n}}_{mid}$ and $\operatorname{mean}(\overrightarrow{\mathbf{n}}_{k}, \overrightarrow{\mathbf{n}}_{k+1})$.
  • Figure 3: Qualitative comparisons for different $N_{pose}$. The artifact decreases as $N_{pose}$ grows.
  • Figure 4: Motivation for the multi-sampling algorithm.Up: Three consecutive RS frames, labeled as (a), (b), and (c), along with the optical flow between each pair of frames, shown in (d) and (e). Down: For a single point in 3D space, we observe its 2D projections in the first (marked in yellow) and second (marked in red) RS frames. These projections exhibit a row displacement of 8 and a column displacement of 18, as depicted in (f). However, during this interval, the camera undergoes a shift of at least 392 poses, as indicated in (g).
  • Figure 5: Comparing the accessible training points in one RS image under different strategies. The image size is $H \times W$, providing $H$ poses. Distinct circle hues represent different RGB values. In case C, pixels transported from other poses are with dash contours, and the number signifying that it originates from the $i$-th pose.
  • ...and 6 more figures