Table of Contents
Fetching ...

Restoration Adaptation for Semantic Segmentation on Low Quality Images

Kai Guan, Rongyuan Wu, Shuai Li, Wentao Zhu, Wenjun Zeng, Lei Zhang

TL;DR

Restoration Adaptation for Semantic Segmentation (RASS) is proposed, which effectively integrates semantic image restoration into the segmentation process, enabling high-quality semantic segmentation on the LQ images directly.

Abstract

In real-world scenarios, the performance of semantic segmentation often deteriorates when processing low-quality (LQ) images, which may lack clear semantic structures and high-frequency details. Although image restoration techniques offer a promising direction for enhancing degraded visual content, conventional real-world image restoration (Real-IR) models primarily focus on pixel-level fidelity and often fail to recover task-relevant semantic cues, limiting their effectiveness when directly applied to downstream vision tasks. Conversely, existing segmentation models trained on high-quality data lack robustness under real-world degradations. In this paper, we propose Restoration Adaptation for Semantic Segmentation (RASS), which effectively integrates semantic image restoration into the segmentation process, enabling high-quality semantic segmentation on the LQ images directly. Specifically, we first propose a Semantic-Constrained Restoration (SCR) model, which injects segmentation priors into the restoration model by aligning its cross-attention maps with segmentation masks, encouraging semantically faithful image reconstruction. Then, RASS transfers semantic restoration knowledge into segmentation through LoRA-based module merging and task-specific fine-tuning, thereby enhancing the model's robustness to LQ images. To validate the effectiveness of our framework, we construct a real-world LQ image segmentation dataset with high-quality annotations, and conduct extensive experiments on both synthetic and real-world LQ benchmarks. The results show that SCR and RASS significantly outperform state-of-the-art methods in segmentation and restoration tasks. Code, models, and datasets will be available at https://github.com/Ka1Guan/RASS.git.

Restoration Adaptation for Semantic Segmentation on Low Quality Images

TL;DR

Restoration Adaptation for Semantic Segmentation (RASS) is proposed, which effectively integrates semantic image restoration into the segmentation process, enabling high-quality semantic segmentation on the LQ images directly.

Abstract

In real-world scenarios, the performance of semantic segmentation often deteriorates when processing low-quality (LQ) images, which may lack clear semantic structures and high-frequency details. Although image restoration techniques offer a promising direction for enhancing degraded visual content, conventional real-world image restoration (Real-IR) models primarily focus on pixel-level fidelity and often fail to recover task-relevant semantic cues, limiting their effectiveness when directly applied to downstream vision tasks. Conversely, existing segmentation models trained on high-quality data lack robustness under real-world degradations. In this paper, we propose Restoration Adaptation for Semantic Segmentation (RASS), which effectively integrates semantic image restoration into the segmentation process, enabling high-quality semantic segmentation on the LQ images directly. Specifically, we first propose a Semantic-Constrained Restoration (SCR) model, which injects segmentation priors into the restoration model by aligning its cross-attention maps with segmentation masks, encouraging semantically faithful image reconstruction. Then, RASS transfers semantic restoration knowledge into segmentation through LoRA-based module merging and task-specific fine-tuning, thereby enhancing the model's robustness to LQ images. To validate the effectiveness of our framework, we construct a real-world LQ image segmentation dataset with high-quality annotations, and conduct extensive experiments on both synthetic and real-world LQ benchmarks. The results show that SCR and RASS significantly outperform state-of-the-art methods in segmentation and restoration tasks. Code, models, and datasets will be available at https://github.com/Ka1Guan/RASS.git.
Paper Structure (19 sections, 6 equations, 8 figures, 10 tables)

This paper contains 19 sections, 6 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Our RASS framework integrates semantic-guided recovery into the segmentation backbone for robust parsing of low-quality images. (a) Mask2Former cheng2022masked trained on high-quality data fails to segment degraded regions (e.g., blurred objects like bag). (b) Fine-tuning on low-quality data recovers coarse structure but struggles with ambiguous targets due to inconsistent features. (c) Restoration as preprocessing can enhance image clarity, but generation without semantic guidance may lead to misclassification (e.g., box). (d) RASS adaptively incorporates restoration into segmentation, better capturing degraded objects while avoiding error propagation from disjoint pipelines.
  • Figure 2: Overview of our RASS training framework. A pre-trained SD model is used as the backbone. In the first stage, the SCR model is trained by injecting trainable SCR LoRA layers into the pretrained diffusion network $\epsilon_\phi$. The LQ image is passed through the frozen VAE encoder $E_\phi$, a LoRA finetuned diffusion network $\epsilon_\phi$, and frozen VAE decoder $D_\phi$ to generate the restored HQ image $\hat{\bm{x}}_H$, the given text prompt $\mathcal{T}$ is processed by the frozen text encoder $T_\phi$ to obtain the corresponding text embedding $c_\mathcal{T}$. The restoration is supervised by the loss $\mathcal{L}_{\mathrm{res}}$ between $\hat{\bm{x}}_H$ and the ground truth $\bm{x}_H$, along with a Semantic-Constraint loss $\mathcal{L}_{\mathrm{SCL}}$ computed from the cross-attention maps $\mathcal{A}$ in SCR LoRA and the semantic masks $\mathcal{M}$. In the second stage, the learned SCR LoRA weights are merged and used to initialize new trainable Segmentation (Seg) LoRA layers to train the RAS model. Internal features $\mathcal{F}$ from $E_\phi$ and $\epsilon_\phi$ are fed into a trainable segmentation head $S_\phi$, with segmentation loss $\mathcal{L}_{\mathrm{seg}}$ to guide the training process. RASS transfers the restoration knowledge to the segmentation task through LoRA-based module merging and task-specific fine-tuning, thus achieving robust segmentation of LQ images.
  • Figure 3: Semantic-Constrained Loss (SCL) computation. Semantically aligned texts inherit corresponding masks (e.g., "church" inherits the mask of "building"), while unmatched terms (e.g., "spire") are excluded from SCL.
  • Figure 4: Comparison of segmentation results of different methods, the above samples are from the simulated degradation ADE20K (first row) and RealLQ (second row) dataset. The comparison models are fine-tuned with LQ images.
  • Figure 5: Comparison of restoration results of different Real-IR methods, the above samples are from RealSR (left) and DIV2K (right) datasets. Red boxes highlight zoom-in areas; yellow boxes indicate ground truth. Please zoom in for clarity.
  • ...and 3 more figures