Table of Contents
Fetching ...

SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction

Wei Shang, Dongwei Ren, Wanying Zhang, Qilong Wang, Pengfei Zhu, Wangmeng Zuo

TL;DR

SelfDRSC++ addresses rolling shutter distortion in dynamic scenes without ground-truth high-framerate GS videos by leveraging dual reversed RS imagery and a self-supervised cycle-consistency framework. It introduces a lightweight dual reversed RS correction network $\\mathcal{F}$ paired with a VFI-based RS reconstruction module $\\mathcal{W}$ that uses distorted time maps to synthesize latent GS frames, enabling one-stage training and supervision at arbitrary intermediate scan times. The approach yields high-framerate GS video with finer textures and better temporal consistency, outperforming several supervised and unsupervised baselines on synthetic RS-GOPRO data and showing strong qualitative results on real-world dual RS imagery. The work offers practical impact by reducing reliance on GS ground-truth data, improving RS correction in real-world scenarios, and delivering perceptually superior high-frame-rate GS content.

Abstract

Modern consumer cameras commonly employ the rolling shutter (RS) imaging mechanism, via which images are captured by scanning scenes row-by-row, resulting in RS distortion for dynamic scenes. To correct RS distortion, existing methods adopt a fully supervised learning manner that requires high framerate global shutter (GS) images as ground-truth for supervision. In this paper, we propose an enhanced Self-supervised learning framework for Dual reversed RS distortion Correction (SelfDRSC++). Firstly, we introduce a lightweight DRSC network that incorporates a bidirectional correlation matching block to refine the joint optimization of optical flows and corrected RS features, thereby improving correction performance while reducing network parameters. Subsequently, to effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images. The RS reconstruction in SelfDRSC++ can be interestingly formulated as a specialized instance of video frame interpolation, where each row in reconstructed RS images is interpolated from predicted GS images by utilizing RS distortion time maps. By achieving superior performance while simplifying the training process, SelfDRSC++ enables feasible one-stage self-supervised training. Additionally, besides start and end RS scanning time, SelfDRSC++ allows supervision of GS images at arbitrary intermediate scanning times, thus enabling the learned DRSC network to generate high framerate GS videos. The code and trained models are available at \url{https://github.com/shangwei5/SelfDRSC_plusplus}.

SelfDRSC++: Self-Supervised Learning for Dual Reversed Rolling Shutter Correction

TL;DR

SelfDRSC++ addresses rolling shutter distortion in dynamic scenes without ground-truth high-framerate GS videos by leveraging dual reversed RS imagery and a self-supervised cycle-consistency framework. It introduces a lightweight dual reversed RS correction network paired with a VFI-based RS reconstruction module that uses distorted time maps to synthesize latent GS frames, enabling one-stage training and supervision at arbitrary intermediate scan times. The approach yields high-framerate GS video with finer textures and better temporal consistency, outperforming several supervised and unsupervised baselines on synthetic RS-GOPRO data and showing strong qualitative results on real-world dual RS imagery. The work offers practical impact by reducing reliance on GS ground-truth data, improving RS correction in real-world scenarios, and delivering perceptually superior high-frame-rate GS content.

Abstract

Modern consumer cameras commonly employ the rolling shutter (RS) imaging mechanism, via which images are captured by scanning scenes row-by-row, resulting in RS distortion for dynamic scenes. To correct RS distortion, existing methods adopt a fully supervised learning manner that requires high framerate global shutter (GS) images as ground-truth for supervision. In this paper, we propose an enhanced Self-supervised learning framework for Dual reversed RS distortion Correction (SelfDRSC++). Firstly, we introduce a lightweight DRSC network that incorporates a bidirectional correlation matching block to refine the joint optimization of optical flows and corrected RS features, thereby improving correction performance while reducing network parameters. Subsequently, to effectively train the DRSC network, we propose a self-supervised learning strategy that ensures cycle consistency between input and reconstructed dual reversed RS images. The RS reconstruction in SelfDRSC++ can be interestingly formulated as a specialized instance of video frame interpolation, where each row in reconstructed RS images is interpolated from predicted GS images by utilizing RS distortion time maps. By achieving superior performance while simplifying the training process, SelfDRSC++ enables feasible one-stage self-supervised training. Additionally, besides start and end RS scanning time, SelfDRSC++ allows supervision of GS images at arbitrary intermediate scanning times, thus enabling the learned DRSC network to generate high framerate GS videos. The code and trained models are available at \url{https://github.com/shangwei5/SelfDRSC_plusplus}.
Paper Structure (23 sections, 17 equations, 9 figures, 5 tables)

This paper contains 23 sections, 17 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of capturing dual RS images with reversed scanning directions, i.e., top-to-bottom ($\bm{I}_{t2b}$) and bottom-to-top ($\bm{I}_{b2t}$). In this work, we propose a more effective self-supervised learning method SelfDRSC++ to correct RS distortion. In comparison to state-of-the-art supervised RS correction methods CVR Fan_2022_CVPR and IFED zhong2022bringing, our SelfDRSC++ can generate high framerate GS videos with finer textures and better temporary consistency.
  • Figure 2: Training framework of our SelfDRSC++, which consists of two key modules, i.e., DRSC network $\mathcal{F}$ for generating GS images $\{\hat{\bm{I}}_g^{(t_1)}, \hat{\bm{I}}_g^{(t_m)}, \hat{\bm{I}}_g^{(t_H)}\}$ from input dual RS images $\bm I_{t2b}$ and $\bm I_{b2t}$, and a VFI-based RS reconstruction module $\mathcal{W}$ for reconstructing dual reversed RS images. Finally, self-supervised loss $\mathcal{L}_{se}$ and $\mathcal{L}_{sme}$ for enforcing the cycle consistency between input and reconstructed RS images can be employed. Details regarding $\mathcal{F}$ and $\mathcal{W}$ can be found in Figs. \ref{['fig:framework']} and \ref{['fig:RS gen']}, respectively.
  • Figure 3: Architecture of the DRSC network $\mathcal{F}$. The network can correct dual RS images to GS images according to time displacements. It primarily consists of two encoders that extract correlation features $\{\bm{G}_{t2b}, \bm{G}_{b2t}\}$, and content features $\{\bm{X}_{t2b, d}, \bm{X}_{b2t, d}\} (d \in \{4,3,2,1\})$, followed by a decoder with each scale comprising a joint upsampling block ($\mathtt{UConv}$), a matching block, and an updating block. The final output from the last decoder layer is integrated through a multi-field refinement Eq. (\ref{['eq:drsc']}) to form the final GS image.
  • Figure 4: VFI-based module $\mathcal{W}$ for reconstructing dual reversed RS images. $\mathcal{W}$ is capable of generating the corresponding RS image based on GS inputs and given distorted time maps. And module $\mathcal{W}$ is frozen during the training process.
  • Figure 5: Visual comparison on RS-GOPRO. Our SelfDRSC++ yields less ghosting while maintaining fine details. The evaluation metrics PSNR and SSIM tend to exhibit a preference for over-smoothed results, while LPIPS demonstrates a stronger correlation with human visual perception.
  • ...and 4 more figures