Table of Contents
Fetching ...

Neural Super-Resolution for Real-time Rendering with Radiance Demodulation

Jia Li, Ziling Chen, Xiaolong Wu, Lu Wang, Beibei Wang, Lei Zhang

TL;DR

This work tackles real-time super-resolution for rendering by decoupling radiance into a smooth lighting component and a high-frequency material component through radiance demodulation, enabling SR to focus on the lighting part while remodulating with HR material data. A lightweight, occlusion-aware warping strategy with a motion mask and gated convolution reduces ghosting in dynamic scenes, and a frame-recurrent ConvLSTM-based network with a temporal loss improves temporal stability. The approach achieves high-quality $4\times4$ SR with real-time performance, outperforming state-of-the-art VSR and RRSR baselines in perceptual metrics and temporal consistency, and demonstrates good generalization across scenes. The method promises practical impact for interactive applications such as games and VR by delivering texture-rich, temporally stable SR at real-time speeds.

Abstract

It is time-consuming to render high-resolution images in applications such as video games and virtual reality, and thus super-resolution technologies become increasingly popular for real-time rendering. However, it is challenging to preserve sharp texture details, keep the temporal stability and avoid the ghosting artifacts in real-time super-resolution rendering. To address this issue, we introduce radiance demodulation to separate the rendered image or radiance into a lighting component and a material component, considering the fact that the light component is smoother than the rendered image so that the high-resolution material component with detailed textures can be easily obtained. We perform the super-resolution on the lighting component only and re-modulate it with the high-resolution material component to obtain the final super-resolution image with more texture details. A reliable warping module is proposed by explicitly marking the occluded regions to avoid the ghosting artifacts. To further enhance the temporal stability, we design a frame-recurrent neural network and a temporal loss to aggregate the previous and current frames, which can better capture the spatial-temporal consistency among reconstructed frames. As a result, our method is able to produce temporally stable results in real-time rendering with high-quality details, even in the challenging 4 $\times$ 4 super-resolution scenarios.

Neural Super-Resolution for Real-time Rendering with Radiance Demodulation

TL;DR

This work tackles real-time super-resolution for rendering by decoupling radiance into a smooth lighting component and a high-frequency material component through radiance demodulation, enabling SR to focus on the lighting part while remodulating with HR material data. A lightweight, occlusion-aware warping strategy with a motion mask and gated convolution reduces ghosting in dynamic scenes, and a frame-recurrent ConvLSTM-based network with a temporal loss improves temporal stability. The approach achieves high-quality SR with real-time performance, outperforming state-of-the-art VSR and RRSR baselines in perceptual metrics and temporal consistency, and demonstrates good generalization across scenes. The method promises practical impact for interactive applications such as games and VR by delivering texture-rich, temporally stable SR at real-time speeds.

Abstract

It is time-consuming to render high-resolution images in applications such as video games and virtual reality, and thus super-resolution technologies become increasingly popular for real-time rendering. However, it is challenging to preserve sharp texture details, keep the temporal stability and avoid the ghosting artifacts in real-time super-resolution rendering. To address this issue, we introduce radiance demodulation to separate the rendered image or radiance into a lighting component and a material component, considering the fact that the light component is smoother than the rendered image so that the high-resolution material component with detailed textures can be easily obtained. We perform the super-resolution on the lighting component only and re-modulate it with the high-resolution material component to obtain the final super-resolution image with more texture details. A reliable warping module is proposed by explicitly marking the occluded regions to avoid the ghosting artifacts. To further enhance the temporal stability, we design a frame-recurrent neural network and a temporal loss to aggregate the previous and current frames, which can better capture the spatial-temporal consistency among reconstructed frames. As a result, our method is able to produce temporally stable results in real-time rendering with high-quality details, even in the challenging 4 4 super-resolution scenarios.
Paper Structure (19 sections, 5 equations, 8 figures, 6 tables)

This paper contains 19 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Quality comparison between our method and the state-of-the-art methods NSRR NSRR, BasicVSR++ chan2022basicvsr++, TTVSR liu2022ttvsr and RVRT liang2022rvrt. The upsampling ratio is set as 4 $\times$ 4. By radiance demodulation, we perform super-resolution on the smooth lighting component only, allowing our method to preserve richer scene details after re-modulation with the high-resolution material component.
  • Figure 2: Our network includes four modules. Radiance demodulation: together with the material component, the LR rendered image (radiance) is demodulated to a lighting component for spatial feature extraction. Reliable warping: the warped lighting components of two previous frames and motion masks are fed into a gated convolution for temporal feature extraction. Frame-recurrent reconstruction: features from the previously reconstructed SR lighting component and other features are fed into a ConvLSTM followed by a U-shaped module to reconstruct the SR lighting component, which is later re-modulated with the HR material component to obtain the SR image.
  • Figure 3: An example of the motion mask generation. Figures (a)-(b) represent the radiance of frame $i-1$ and frame $i$, respectively. The green arrows show the character's moving direction. Figure (c) is the warped result of frame $i-1$ using the traditional MV, where the occluded region has been warped incorrectly (red arrow). Figures (d)-(e) show the traditional MV and dual MV respectively. Figure (f) shows our motion mask, where the previously occluded region is marked.
  • Figure 4: Comparison among our method, FRVSR FRVSR, TecoGAN tecoGAN, NSRR NSRR, BasicVSR++ chan2022basicvsr++, TTVSR liu2022ttvsr and RVRT liang2022rvrt.
  • Figure 5: Comparison among our method, DLSS 2.0 DLSS and FSR 2.0 FSR on the Bistro scene. The target resolution is set as 1920 $\times$ 1080 and the SR factor is set as 2 $\times$ 2.
  • ...and 3 more figures