Table of Contents
Fetching ...

Super-Resolving Blurry Images with Events

Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

TL;DR

This work tackles the problem of recovering HR sharp images from motion-blurred LR inputs by leveraging high-temporal-resolution event data. It introduces EBSR-Net, a one-stage architecture that fuses blurred frames and events via a Multi-scale Center-surround Event Representation (MCER), Symmetric Cross-Modal Attention (SCMA), and an Intermodal Residual Group (IRG) built from Residual Dense Swin Transformer blocks. The method demonstrates substantial PSNR/SSIM improvements over state-of-the-art baselines on GoPro and REDS datasets, while maintaining a compact model and reasonable compute. The results indicate that cross-modal fusion with carefully designed event representations yields robust MSR under complex, non-uniform motion, with potential impact on视觉 tasks in dynamic environments.

Abstract

Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance.

Super-Resolving Blurry Images with Events

TL;DR

This work tackles the problem of recovering HR sharp images from motion-blurred LR inputs by leveraging high-temporal-resolution event data. It introduces EBSR-Net, a one-stage architecture that fuses blurred frames and events via a Multi-scale Center-surround Event Representation (MCER), Symmetric Cross-Modal Attention (SCMA), and an Intermodal Residual Group (IRG) built from Residual Dense Swin Transformer blocks. The method demonstrates substantial PSNR/SSIM improvements over state-of-the-art baselines on GoPro and REDS datasets, while maintaining a compact model and reasonable compute. The results indicate that cross-modal fusion with carefully designed event representations yields robust MSR under complex, non-uniform motion, with potential impact on视觉 tasks in dynamic environments.

Abstract

Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance.
Paper Structure (10 sections, 11 equations, 4 figures, 2 tables)

This paper contains 10 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) illustrates the details of the proposed Multi-scale Center-surround Event Representation (MCER). Here, $\Delta t$ controls the exposure interval, determining the period utilized for quantizing the event representation.
  • Figure 2: (a) illustrates the overview of our proposed EBSR-Net. (b) and (c) provide details of the Symmetric Cross-Modal Attention (SCMA) module and the Residual Dense Swin Transformer Block (RDSTB), respectively. (d) presents the details of the Swin Transformer Layer (STL).
  • Figure 3: Qualitative comparisons of our EBSR-Net with the state-of-the-art methods on the GoPro and the REDS datasets. Six samples are arranged from left to right and top to bottom, with the first three samples from the GoPro dataset and the last three from the REDS dataset.
  • Figure 4: Qualitative ablations of each module of EBSR-Net.