Table of Contents
Fetching ...

A feature refinement module for light-weight semantic segmentation network

Zhiyan Wang, Xin Guo, Song Wang, Peixiao Zheng, Lin Qi

TL;DR

This paper tackles the accuracy-cost dilemma of light-weight semantic segmentation by introducing a Feature Refinement Module (FRM) that aggregates multi-stage backbone features and employs a transformer-style disentangled non-local block to capture global context. The FRM is integrated into an encoder–decoder architecture with an FPN-based decoder, and trained with a hybrid loss that combines cross-entropy and a contrastive term to learn discriminative embeddings. Empirical results on Cityscapes and Bdd100K demonstrate that FRM achieves competitive or superior mIoU at substantially lower computational cost (e.g., 80.4% mIoU at 214.82 GFLOPs on Cityscapes) and notable ablations show gains over traditional context modules. The method offers a practical pathway to high-accuracy, real-time semantic segmentation on resource-constrained devices, validated by detailed experiments and ablations.

Abstract

Low computational complexity and high segmentation accuracy are both essential to the real-world semantic segmentation tasks. However, to speed up the model inference, most existing approaches tend to design light-weight networks with a very limited number of parameters, leading to a considerable degradation in accuracy due to the decrease of the representation ability of the networks. To solve the problem, this paper proposes a novel semantic segmentation method to improve the capacity of obtaining semantic information for the light-weight network. Specifically, a feature refinement module (FRM) is proposed to extract semantics from multi-stage feature maps generated by the backbone and capture non-local contextual information by utilizing a transformer block. On Cityscapes and Bdd100K datasets, the experimental results demonstrate that the proposed method achieves a promising trade-off between accuracy and computational cost, especially for Cityscapes test set where 80.4% mIoU is achieved and only 214.82 GFLOPs are required.

A feature refinement module for light-weight semantic segmentation network

TL;DR

This paper tackles the accuracy-cost dilemma of light-weight semantic segmentation by introducing a Feature Refinement Module (FRM) that aggregates multi-stage backbone features and employs a transformer-style disentangled non-local block to capture global context. The FRM is integrated into an encoder–decoder architecture with an FPN-based decoder, and trained with a hybrid loss that combines cross-entropy and a contrastive term to learn discriminative embeddings. Empirical results on Cityscapes and Bdd100K demonstrate that FRM achieves competitive or superior mIoU at substantially lower computational cost (e.g., 80.4% mIoU at 214.82 GFLOPs on Cityscapes) and notable ablations show gains over traditional context modules. The method offers a practical pathway to high-accuracy, real-time semantic segmentation on resource-constrained devices, validated by detailed experiments and ablations.

Abstract

Low computational complexity and high segmentation accuracy are both essential to the real-world semantic segmentation tasks. However, to speed up the model inference, most existing approaches tend to design light-weight networks with a very limited number of parameters, leading to a considerable degradation in accuracy due to the decrease of the representation ability of the networks. To solve the problem, this paper proposes a novel semantic segmentation method to improve the capacity of obtaining semantic information for the light-weight network. Specifically, a feature refinement module (FRM) is proposed to extract semantics from multi-stage feature maps generated by the backbone and capture non-local contextual information by utilizing a transformer block. On Cityscapes and Bdd100K datasets, the experimental results demonstrate that the proposed method achieves a promising trade-off between accuracy and computational cost, especially for Cityscapes test set where 80.4% mIoU is achieved and only 214.82 GFLOPs are required.

Paper Structure

This paper contains 10 sections, 6 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The structure of our proposed approach.
  • Figure 2: Illustration of Feature Refinement Module.