An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Zijun Gao; Qi Wang; Taiyuan Mei; Xiaohan Cheng; Yun Zi; Haowei Yang

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

TL;DR

The paper tackles the problem of substantial information loss in SegNet during down-sampling, which degrades semantic segmentation accuracy. It proposes an enhanced encoder–decoder network that uses multiple residual connections to preserve cross-scale details and a balanced cross-entropy loss with a factor $\beta$ to improve convergence under class imbalance. Key contributions include multi-residual feature fusion, a Beta-weighted cross-entropy loss, and demonstrable improvements in mean IoU ($mIoU$) on the PASCAL VOC 2012 dataset compared to SegNet. The work has practical impact by enabling more accurate, scalable AI-driven image analysis with reduced manual inspection across sectors.

Abstract

The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the intricate details across various image scales more effectively, thus minimizing the information loss inherent to down-sampling procedures. Additionally, to enhance the convergence rate of network training and mitigate sample imbalance issues, we have devised a modified cross-entropy loss function incorporating a balancing factor. This modification optimizes the distribution between positive and negative samples, thus improving the efficiency of model training. Experimental evaluations of our model demonstrate a substantial reduction in information loss and improved accuracy in semantic segmentation. Notably, our proposed network architecture demonstrates a substantial improvement in the finely annotated mean Intersection over Union (mIoU) on the dataset compared to the conventional SegNet. The proposed network structure not only reduces operational costs by decreasing manual inspection needs but also scales up the deployment of AI-driven image analysis across different sectors.

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

TL;DR

to improve convergence under class imbalance. Key contributions include multi-residual feature fusion, a Beta-weighted cross-entropy loss, and demonstrable improvements in mean IoU (

) on the PASCAL VOC 2012 dataset compared to SegNet. The work has practical impact by enabling more accurate, scalable AI-driven image analysis with reduced manual inspection across sectors.

Abstract

Paper Structure (12 sections, 3 equations, 4 figures, 2 tables)

This paper contains 12 sections, 3 equations, 4 figures, 2 tables.

Introduction
ALGORITHM AND MODEL
SegNet Model
Model Establishment
Model Training
Improved Cross-Entropy Loss Function
Experimental Results and Analysis
Experimental Environment
Evaluation Metrics
Experimental Results and Analysis
Dataset Overview and Parameter Settings
Conclusion

Figures (4)

Figure 1: SegNet Model Architecture
Figure 2: SegNet Model Architecture
Figure 3: Training process
Figure 4: Mathematical Concept of IoU

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

TL;DR

Abstract

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)