Table of Contents
Fetching ...

ReShader: View-Dependent Highlights for Single Image View-Synthesis

Avinash Paliwal, Brandon Nguyen, Andrii Tsarov, Nima Khademi Kalantari

TL;DR

This work tackles the challenge of view-dependent highlights in single-image view synthesis by decomposing the problem into pixel reshading and relocation. It introduces a neural reshading network that predicts a view-aware version of the input image $I_s$ by conditioning on the novel camera and disparity-derived depth, followed by a modular pixel relocation step to generate the final novel view. The reshading network is trained on a large synthetic dataset and optimized with a combination of $\mathcal{L}_1$, perceptual, and style losses, and is shown to move highlights realistically across views when combined with relocation methods. The approach yields improved realism on real scenes and demonstrates the value of explicitly modeling shading changes in addition to pixel relocation for single-image view synthesis.

Abstract

In recent years, novel view synthesis from a single image has seen significant progress thanks to the rapid advancements in 3D scene representation and image inpainting techniques. While the current approaches are able to synthesize geometrically consistent novel views, they often do not handle the view-dependent effects properly. Specifically, the highlights in their synthesized images usually appear to be glued to the surfaces, making the novel views unrealistic. To address this major problem, we make a key observation that the process of synthesizing novel views requires changing the shading of the pixels based on the novel camera, and moving them to appropriate locations. Therefore, we propose to split the view synthesis process into two independent tasks of pixel reshading and relocation. During the reshading process, we take the single image as the input and adjust its shading based on the novel camera. This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image. We propose to use a neural network to perform reshading and generate a large set of synthetic input-reshaded pairs to train our network. We demonstrate that our approach produces plausible novel view images with realistic moving highlights on a variety of real world scenes.

ReShader: View-Dependent Highlights for Single Image View-Synthesis

TL;DR

This work tackles the challenge of view-dependent highlights in single-image view synthesis by decomposing the problem into pixel reshading and relocation. It introduces a neural reshading network that predicts a view-aware version of the input image by conditioning on the novel camera and disparity-derived depth, followed by a modular pixel relocation step to generate the final novel view. The reshading network is trained on a large synthetic dataset and optimized with a combination of , perceptual, and style losses, and is shown to move highlights realistically across views when combined with relocation methods. The approach yields improved realism on real scenes and demonstrates the value of explicitly modeling shading changes in addition to pixel relocation for single-image view synthesis.

Abstract

In recent years, novel view synthesis from a single image has seen significant progress thanks to the rapid advancements in 3D scene representation and image inpainting techniques. While the current approaches are able to synthesize geometrically consistent novel views, they often do not handle the view-dependent effects properly. Specifically, the highlights in their synthesized images usually appear to be glued to the surfaces, making the novel views unrealistic. To address this major problem, we make a key observation that the process of synthesizing novel views requires changing the shading of the pixels based on the novel camera, and moving them to appropriate locations. Therefore, we propose to split the view synthesis process into two independent tasks of pixel reshading and relocation. During the reshading process, we take the single image as the input and adjust its shading based on the novel camera. This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image. We propose to use a neural network to perform reshading and generate a large set of synthetic input-reshaded pairs to train our network. We demonstrate that our approach produces plausible novel view images with realistic moving highlights on a variety of real world scenes.
Paper Structure (14 sections, 5 equations, 13 figures, 3 tables)

This paper contains 14 sections, 5 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: We compare our results against 3D Moments by Wang et al. wang2022_3dmoments. 3D Moments reconstructs the novel image by moving the input pixels according to their depth values. As such, the highlights are treated as textures and appear to be glued to the wooden table. Our approach, however, is able to properly move the highlights over the table. The red crosses mark the same location on the table. Note that the cross is inside the highlight in the input and 3D Moment's results, but it appears to be outside the highlight in our results.
  • Figure 2: We visualize the image formation process for the input (${\bf c}$) and novel (${\bf c}^\prime$) cameras. A surface point ${\bf x}$ appears at two different locations (${\bf p}_{{\bf x}}$ and ${\bf p}^\prime_{{\bf x}}$) in the input and novel images. Moreover, the shading of point ${\bf x}$ in the two images is determined by $L_o({\bf x}, {\bf \omega}_o^{{\bf x}\shortrightarrow {\bf c}})$ and $L_o({\bf x}, {\bf \omega}_o^{{\bf x}\shortrightarrow {\bf c}^\prime})$, and thus is different. Note that the incoming radiance $L_i$, surface normal (and consequently $\theta_i$), and the BRDF (shown with curly black line), are the same for both the input and novel view images.
  • Figure 3: We show an input and a novel view image. The same point on the table appears at different locations and with different shadings in the input and novel images. Therefore, the view synthesis process can be divided into two tasks of pixel reshading and relocation.
  • Figure 4: We visualize our modification to the path tracer to render the reshaded images. We trace a primary ray to find the first intersection from the input camera. We then find the ray from the novel camera to this point (novel primary ray). This ray is then used for shading computation at the intersection point and generation of the secondary ray.
  • Figure 5: For each training example in our dataset, we store the input and ground truth reshaded images, as well as the depth and validity mask. The red arrows point to the highlights in the input image that are moved in the reshaded image. Note that the objects in the reshaded image are in the same location as the input image, since reshading happens in the input camera frame. Small areas in the reshaded image (indicated by the green arrow) contain incorrect shading. We mask these out using the validity mask in our training loss.
  • ...and 8 more figures