Table of Contents
Fetching ...

FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

Geunhyuk Youk, Jihyong Oh, Munchurl Kim

TL;DR

FMA-Net++ tackles real-world joint video super-resolution and deblurring under dynamically varying exposure by explicitly modeling motion-exposure coupling. It introduces a sequence-level Hierarchical Refinement with Bidirectional Propagation backbone (HRBP) and an Exposure Time-aware Modulation (ETM) that conditions features on per-frame exposure, enabling an exposure-aware Flow-Guided Dynamic Filtering (FGDF) to estimate degradation priors. The framework decouples degradation learning from restoration via Net^D and Net^R, guided by a pretrained Exposure Time-aware Feature Extractor (ETE). Two new benchmarks, REDS-ME and REDS-RE, assess performance under realistic exposure dynamics, where FMA-Net++ achieves state-of-the-art accuracy, temporal consistency, and efficiency, with strong generalization to real-world videos. Ablation studies confirm the importance of hierarchical modeling, exposure-aware conditioning, and the degradation-prior-guided restoration pipeline for robust VSRDB under dynamic exposure.

Abstract

Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifact of auto-exposure or low-light capture. We present FMA-Net++, a framework for joint video super-resolution and deblurring that explicitly models this coupled effect of motion and dynamically varying exposure. FMA-Net++ adopts a sequence-level architecture built from Hierarchical Refinement with Bidirectional Propagation blocks, enabling parallel, long-range temporal modeling. Within each block, an Exposure Time-aware Modulation layer conditions features on per-frame exposure, which in turn drives an exposure-aware Flow-Guided Dynamic Filtering module to infer motion- and exposure-aware degradation kernels. FMA-Net++ decouples degradation learning from restoration: the former predicts exposure- and motion-aware priors to guide the latter, improving both accuracy and efficiency. To evaluate under realistic capture conditions, we introduce REDS-ME (multi-exposure) and REDS-RE (random-exposure) benchmarks. Trained solely on synthetic data, FMA-Net++ achieves state-of-the-art accuracy and temporal consistency on our new benchmarks and GoPro, outperforming recent methods in both restoration quality and inference speed, and generalizes well to challenging real-world videos.

FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

TL;DR

FMA-Net++ tackles real-world joint video super-resolution and deblurring under dynamically varying exposure by explicitly modeling motion-exposure coupling. It introduces a sequence-level Hierarchical Refinement with Bidirectional Propagation backbone (HRBP) and an Exposure Time-aware Modulation (ETM) that conditions features on per-frame exposure, enabling an exposure-aware Flow-Guided Dynamic Filtering (FGDF) to estimate degradation priors. The framework decouples degradation learning from restoration via Net^D and Net^R, guided by a pretrained Exposure Time-aware Feature Extractor (ETE). Two new benchmarks, REDS-ME and REDS-RE, assess performance under realistic exposure dynamics, where FMA-Net++ achieves state-of-the-art accuracy, temporal consistency, and efficiency, with strong generalization to real-world videos. Ablation studies confirm the importance of hierarchical modeling, exposure-aware conditioning, and the degradation-prior-guided restoration pipeline for robust VSRDB under dynamic exposure.

Abstract

Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifact of auto-exposure or low-light capture. We present FMA-Net++, a framework for joint video super-resolution and deblurring that explicitly models this coupled effect of motion and dynamically varying exposure. FMA-Net++ adopts a sequence-level architecture built from Hierarchical Refinement with Bidirectional Propagation blocks, enabling parallel, long-range temporal modeling. Within each block, an Exposure Time-aware Modulation layer conditions features on per-frame exposure, which in turn drives an exposure-aware Flow-Guided Dynamic Filtering module to infer motion- and exposure-aware degradation kernels. FMA-Net++ decouples degradation learning from restoration: the former predicts exposure- and motion-aware priors to guide the latter, improving both accuracy and efficiency. To evaluate under realistic capture conditions, we introduce REDS-ME (multi-exposure) and REDS-RE (random-exposure) benchmarks. Trained solely on synthetic data, FMA-Net++ achieves state-of-the-art accuracy and temporal consistency on our new benchmarks and GoPro, outperforming recent methods in both restoration quality and inference speed, and generalizes well to challenging real-world videos.

Paper Structure

This paper contains 36 sections, 17 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: FMA-Net++ outperforms state-of-the-art methods in real-world qualitative results and quantitative benchmarks for VSRDB.
  • Figure 2: Conceptual illustration and overview of the FMA-Net++ framework.
  • Figure 3: Architecture of FMA-Net++ for joint video super-resolution and deblurring (VSRDB).
  • Figure 4: Details of an HRBP block. (a) Structure of the HRBP block at (j+1)-th refinement step for i-th frame (Sec. \ref{['sec:hrbp']}). (b) Structure of Multi-Attention. FFN refers to the feed-forward network of the transformer vaswani2017attentiondosovitskiy2020image.
  • Figure 5: Qualitative comparisons of $\times 4$ VSRDB on REDS4-ME-$5\!:\!5$ and GoPro nah2017deep. Each scene contains severe motion blur with different characteristics. Best viewed in zoom.
  • ...and 10 more figures