DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution

Xiaoyan Lei; Wenlong Zhang; Biao Luo; Hui Liang; Weifeng Cao; Qiuting Lin

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution

Xiaoyan Lei, Wenlong Zhang, Biao Luo, Hui Liang, Weifeng Cao, Qiuting Lin

TL;DR

This paper revisits the capabilities of the Recognize Anything Model for degraded images by calculating text similarity and proposes a Real Embedding Extractor (REE), which achieves significant recognition performance gain on degraded image content through contrastive learning.

Abstract

Multimodal large models have shown excellent ability in addressing image super-resolution in real-world scenarios by leveraging language class as condition information, yet their abilities in degraded images remain limited. In this paper, we first revisit the capabilities of the Recognize Anything Model (RAM) for degraded images by calculating text similarity. We find that directly using contrastive learning to fine-tune RAM in the degraded space is difficult to achieve acceptable results. To address this issue, we employ a degradation selection strategy to propose a Real Embedding Extractor (REE), which achieves significant recognition performance gain on degraded image content through contrastive learning. Furthermore, we use a Conditional Feature Modulator (CFM) to incorporate the high-level information of REE for a powerful Mamba-based network, which can leverage effective pixel information to restore image textures and produce visually pleasing results. Extensive experiments demonstrate that the REE can effectively help image super-resolution networks balance fidelity and perceptual quality, highlighting the great potential of Mamba in real-world applications. The source code of this work will be made publicly available at: https://github.com/nathan66666/DACESR.git

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 8 figures, 6 tables)

This paper contains 28 sections, 8 equations, 8 figures, 6 tables.

Introduction
Related Work
Non-blind Image Super-Resolution
Real-World Image Super-Resolution
Conditional Network
Motivation
Re-exploring the Capabilities of Recognize Anything Model on Degraded Images
Enhancing the Representation Capability of Recognize Anything Model for Degraded Images
Methodology
Preliminaries
Overall of DACESR
Real Embedding Extractor
Conditional Feature Modulator
Mamba Network
Loss Function
...and 13 more sections

Figures (8)

Figure 1: The tag representations of RAM on clean images and images with varying levels of degradation."Similarity" refers to the Jaccard similarity (Eq. (\ref{['eq:sim']})).
Figure 2: Comparison of text output accuracy across RAM under different types and intensities of degradation. (a) Blur: The x-axis represents isotropic Gaussian blur, where larger values indicate stronger blurring. (b) JPEG: The x-axis denotes JPEG compression levels, with lower values indicating higher compression. (c) Noise: The x-axis represents the intensity of additive Gaussian noise, where higher values correspond to increased noise levels. In (d), the classification is based on the text similarity values of RAM for different degraded outputs, which are evenly divided into four categories in descending order. Each category contains multiple types/levels of degradation.
Figure 3: The overview of DACESR.
Figure 4: The training pipeline of the Real Embedding Extractor (REE).
Figure 5: The LAM results of different model architectures across various types of degradation. LAM attribution indicates the significance of each pixel in the input LR image during the reconstruction process of the patch highlighted by a box. The Diffusion Index (DI) denotes the extent of pixel involvement. A higher DI indicates a broader range of utilized pixels.
...and 3 more figures

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution

TL;DR

Abstract

DACESR: Degradation-Aware Conditional Embedding for Real-World Image Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (8)