Table of Contents
Fetching ...

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai, Yehui Tang, Yunhe Wang

TL;DR

The paper tackles semantic segmentation under computational constraints and introduces CGRSeg, a framework that combines pyramid context extraction with spatial feature reconstruction guided by a Rectangular Self-Calibration Module (RCA). It adds a lightweight Dynamic Prototype Guided (DPG) head to embed explicit class information, enabling stronger foreground discrimination while preserving efficiency. The key innovations are the RCA for axial context modeling and shape calibration via large-kernel strip convolutions, and the DPG head with dynamic prototypes that weight pixel features by class embedding. Empirically, CGRSeg achieves state-of-the-art results on ADE20K, COCO-Stuff, and Pascal Context with substantially lower FLOPs (e.g., $4.0$ GFLOPs for $43.6\%$ mIoU on ADE20K), demonstrating a practical path to high-quality, efficient semantic segmentation.

Abstract

Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstruction and pyramid context extraction. It captures the axial global context in both horizontal and vertical directions to explicitly model rectangular key areas. A shape self-calibration function is designed to make the key areas closer to foreground objects. Besides, a lightweight Dynamic Prototype Guided head is proposed to improve the classification of foreground objects by explicit class embedding. Our CGRSeg is extensively evaluated on ADE20K, COCO-Stuff, and Pascal Context benchmarks, and achieves state-of-the-art semantic performance. Specifically, it achieves $43.6\%$ mIoU on ADE20K with only $4.0$ GFLOPs, which is $0.9\%$ and $2.5\%$ mIoU better than SeaFormer and SegNeXt but with about $38.0\%$ fewer GFLOPs. Code is available at https://github.com/nizhenliang/CGRSeg.

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

TL;DR

The paper tackles semantic segmentation under computational constraints and introduces CGRSeg, a framework that combines pyramid context extraction with spatial feature reconstruction guided by a Rectangular Self-Calibration Module (RCA). It adds a lightweight Dynamic Prototype Guided (DPG) head to embed explicit class information, enabling stronger foreground discrimination while preserving efficiency. The key innovations are the RCA for axial context modeling and shape calibration via large-kernel strip convolutions, and the DPG head with dynamic prototypes that weight pixel features by class embedding. Empirically, CGRSeg achieves state-of-the-art results on ADE20K, COCO-Stuff, and Pascal Context with substantially lower FLOPs (e.g., GFLOPs for mIoU on ADE20K), demonstrating a practical path to high-quality, efficient semantic segmentation.

Abstract

Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstruction and pyramid context extraction. It captures the axial global context in both horizontal and vertical directions to explicitly model rectangular key areas. A shape self-calibration function is designed to make the key areas closer to foreground objects. Besides, a lightweight Dynamic Prototype Guided head is proposed to improve the classification of foreground objects by explicit class embedding. Our CGRSeg is extensively evaluated on ADE20K, COCO-Stuff, and Pascal Context benchmarks, and achieves state-of-the-art semantic performance. Specifically, it achieves mIoU on ADE20K with only GFLOPs, which is and mIoU better than SeaFormer and SegNeXt but with about fewer GFLOPs. Code is available at https://github.com/nizhenliang/CGRSeg.
Paper Structure (12 sections, 7 equations, 5 figures, 10 tables)

This paper contains 12 sections, 7 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Performance vs. FLOPs and Throughput on ADE20K ade20k. Our model achieves better trade-off between accuracy and computational cost than prior methods. Moreover, our model outperforms other models in throughput with higher accuracy.
  • Figure 2: The overall architecture of CGRSeg. The Rectangular Self-Calibration Module (RCM) is designed for spatial feature reconstruction and pyramid context extraction. The rectangular self-calibration attention (RCA) explicitly models the rectangular region and calibrates the attention shape. The Dynamic Prototype Guided (DPG) head improves the classification of the foreground objects via explicit class embedding.
  • Figure 3: The shape change of the highlighted region is caused by the rectangular self-calibration attention. By optimizing the weights of the two strip convolutions during training, the attention region is calibrated closer to the foreground object.
  • Figure 4: Dynamic Prototype Guided Head.
  • Figure 5: Qualitative Comparison of CGRSeg-T on the ADE20K dataset.