Table of Contents
Fetching ...

NegVSR: Augmenting Negatives for Generalized Noise Modeling in Real-World Video Super-Resolution

Yexing Song, Meilin Wang, Zhijing Yang, Xiaoyu Xian, Yukai Shi

TL;DR

NegVSR addresses real-world video super-resolution under unknown degradation by modeling generalized noise with sequential noise sequences and negative augmentations. It introduces NegMix, which combines real-world noise with LR frames and applies patch-based rotation; it uses Augmented Positive and Augmented Negative guidance losses to enforce consistency and denoise robustness. The approach expands the degradation domain beyond classical kernels and demonstrates superior performance on VideoLQ and FLIR datasets, with ablations validating the components. The work highlights the importance of preserving sequential noise structure in video VSR and proposes a practical, out-of-distribution noise modeling framework. It also notes potential improvements in inference speed with lightweight architectures.

Abstract

The capability of video super-resolution (VSR) to synthesize high-resolution (HR) video from ideal datasets has been demonstrated in many works. However, applying the VSR model to real-world video with unknown and complex degradation remains a challenging task. First, existing degradation metrics in most VSR methods are not able to effectively simulate real-world noise and blur. On the contrary, simple combinations of classical degradation are used for real-world noise modeling, which led to the VSR model often being violated by out-of-distribution noise. Second, many SR models focus on noise simulation and transfer. Nevertheless, the sampled noise is monotonous and limited. To address the aforementioned problems, we propose a Negatives augmentation strategy for generalized noise modeling in Video Super-Resolution (NegVSR) task. Specifically, we first propose sequential noise generation toward real-world data to extract practical noise sequences. Then, the degeneration domain is widely expanded by negative augmentation to build up various yet challenging real-world noise sets. We further propose the augmented negative guidance loss to learn robust features among augmented negatives effectively. Extensive experiments on real-world datasets (e.g., VideoLQ and FLIR) show that our method outperforms state-of-the-art methods with clear margins, especially in visual quality. Project page is available at: https://negvsr.github.io/.

NegVSR: Augmenting Negatives for Generalized Noise Modeling in Real-World Video Super-Resolution

TL;DR

NegVSR addresses real-world video super-resolution under unknown degradation by modeling generalized noise with sequential noise sequences and negative augmentations. It introduces NegMix, which combines real-world noise with LR frames and applies patch-based rotation; it uses Augmented Positive and Augmented Negative guidance losses to enforce consistency and denoise robustness. The approach expands the degradation domain beyond classical kernels and demonstrates superior performance on VideoLQ and FLIR datasets, with ablations validating the components. The work highlights the importance of preserving sequential noise structure in video VSR and proposes a practical, out-of-distribution noise modeling framework. It also notes potential improvements in inference speed with lightweight architectures.

Abstract

The capability of video super-resolution (VSR) to synthesize high-resolution (HR) video from ideal datasets has been demonstrated in many works. However, applying the VSR model to real-world video with unknown and complex degradation remains a challenging task. First, existing degradation metrics in most VSR methods are not able to effectively simulate real-world noise and blur. On the contrary, simple combinations of classical degradation are used for real-world noise modeling, which led to the VSR model often being violated by out-of-distribution noise. Second, many SR models focus on noise simulation and transfer. Nevertheless, the sampled noise is monotonous and limited. To address the aforementioned problems, we propose a Negatives augmentation strategy for generalized noise modeling in Video Super-Resolution (NegVSR) task. Specifically, we first propose sequential noise generation toward real-world data to extract practical noise sequences. Then, the degeneration domain is widely expanded by negative augmentation to build up various yet challenging real-world noise sets. We further propose the augmented negative guidance loss to learn robust features among augmented negatives effectively. Extensive experiments on real-world datasets (e.g., VideoLQ and FLIR) show that our method outperforms state-of-the-art methods with clear margins, especially in visual quality. Project page is available at: https://negvsr.github.io/.
Paper Structure (11 sections, 16 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 11 sections, 16 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overview of the proposed NegVSR. (a) Our approach initially extracts noise sequence $N_{sq}$ through window sequence $C$ in an unsupervised manner. The motion of $C$ occurs within the OOD video noise dataset $V_{od}$. Subsequently, it mixes $N_{sq}$ and LR video $V_{lr}$ to create novel training input $V_{lr}^N$. (b) $V_{lr}^N$ is applied with a patch-based random central rotation to derive $V_{neg}$. (c) Both $V_{neg}$ and $V_{lr}$ are fed into the VSR model to generate $\widehat{Y}$ and $Y$, respectively. And $\mathcal{L}_{Aug-P}$ enables the model to recover realistic pixels from the $V_{lr}$. $\mathcal{L}_{Aug-N}$ drives $Y$ to learn the robust features present in the negative output $\widehat{Y}$.
  • Figure 2: Two window sequences $C_1$ and $C_2$ originate from the same video, which comprises consecutive frames. Based on this sliding window sequence strategy, the sequential Noise-Prone Region (yellow box) that contains less texture and more noise is selected by the low variance feature for noise augmentation.
  • Figure 3: A grid visualization of mixed images using the NegMix method by adjusting the noise weight (vertical) and rotation ratio (horizontal). We set $M$ to 0.5 and varied $P$ from 0 to 1 with an interval of 0.1 in our NegVSR setting. Zooming up for a better view.
  • Figure 4: The figure depicts the process of our Augmented Negative Guidance approach. We obtain the positive output $\widehat{Y}$ by passing $V_{hr}$ sequential through the degeneration model $D$ and VSR. Then we inject noise sequence $N_{sq}$ into the degraded video and apply the video with negative augmentation. Finally, we encourage the model to learn robust features from the augmented noise and video by $\mathcal{L}_{Aug-N}$ and $\mathcal{L}_{Aug-P}$.
  • Figure 5: We conduct a visual comparison with recent state-of-the-art methods on real-world images from the VideoLQ (1, 2 rows) and FLIR testing dataset (3, 4 rows), with the upsampling scale factor of 4.