Table of Contents
Fetching ...

How Powerful Potential of Attention on Image Restoration?

Cong Wang, Jinshan Pan, Yeying Jin, Liyan Wang, Wei Wang, Gang Fu, Wenqi Ren, Xiaochun Cao

TL;DR

This document provides a detailed handbook for ECCV submissions, focusing on formatting, anonymity, and submission workflow. It covers language requirements, template usage, page limits, and line numbering, while also detailing figures, formulas, references, and code formatting. It emphasizes double-blind review, proper anonymization of citations, and the separation of review and camera-ready preparation. Overall, it serves as a practical, end-to-end guide to produce consistent, review-friendly ECCV manuscripts.

Abstract

Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches.

How Powerful Potential of Attention on Image Restoration?

TL;DR

This document provides a detailed handbook for ECCV submissions, focusing on formatting, anonymity, and submission workflow. It covers language requirements, template usage, page limits, and line numbering, while also detailing figures, formulas, references, and code formatting. It emphasizes double-blind review, proper anonymization of citations, and the separation of review and camera-ready preparation. Overall, it serves as a practical, end-to-end guide to produce consistent, review-friendly ECCV manuscripts.

Abstract

Transformers have demonstrated their effectiveness in image restoration tasks. Existing Transformer architectures typically comprise two essential components: multi-head self-attention and feed-forward network (FFN). The former captures long-range pixel dependencies, while the latter enables the model to learn complex patterns and relationships in the data. Previous studies have demonstrated that FFNs are key-value memories \cite{geva2020transformer}, which are vital in modern Transformer architectures. In this paper, we conduct an empirical study to explore the potential of attention mechanisms without using FFN and provide novel structures to demonstrate that removing FFN is flexible for image restoration. Specifically, we propose Continuous Scaling Attention (\textbf{CSAttn}), a method that computes attention continuously in three stages without using FFN. To achieve competitive performance, we propose a series of key components within the attention. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance. We apply our \textbf{CSAttn} to several image restoration tasks and show that our model can outperform CNN-based and Transformer-based image restoration approaches.
Paper Structure (22 sections, 2 equations, 2 figures, 1 table)

This paper contains 22 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: One kernel at $x_s$ (dotted kernel) or two kernels at $x_i$ and $x_j$ (left and right) lead to the same summed estimate at $x_s$. This shows a figure consisting of different types of lines. Elements of the figure described in the caption should be set in italics, in parentheses, as shown in this sample caption. The last sentence of a figure caption should generally end with a full stop, except when the caption is not a full sentence.
  • Figure 2: Centered, short example caption