Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

Tuan T. Nguyen; Phan Le; Yasir Hassan; Mina Sartipi

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

Tuan T. Nguyen, Phan Le, Yasir Hassan, Mina Sartipi

TL;DR

This work tackles semantic segmentation for real-world and synthetic forward-facing vehicle images under domain shift across weather conditions. It presents a robust framework built on the HRNet backbone, augmented with Object-Contextual Representations (OCR) and Hierarchical Multi-scale Attention (HMA), and employs Domain-Based Batch Normalization (DBN) to align distributions between Cityscapes and CARLA data. The approach achieves a validation mean intersection-over-union of $81.259$ ($mIoU$) on the challenge dataset, demonstrating effective cross-domain generalization. The study highlights the value of combining high-resolution feature networks with context-aware refinements and domain-specific normalization for multi-domain semantic segmentation in autonomous driving contexts, with potential improvements from SegFix in future work.

Abstract

In this paper, we present the submission to the 5th Annual Smoky Mountains Computational Sciences Data Challenge, Challenge 3. This is the solution for semantic segmentation problem in both real-world and synthetic images from a vehicle s forward-facing camera. We concentrate in building a robust model which performs well across various domains of different outdoor situations such as sunny, snowy, rainy, etc. In particular, our method is developed with two main directions: model development and domain adaptation. In model development, we use the High Resolution Network (HRNet) as the baseline. Then, this baseline s result is processed by two coarse-to-fine models: Object-Contextual Representations (OCR) and Hierarchical Multi-scale Attention (HMA) to get the better robust feature. For domain adaption, we implement the Domain-Based Batch Normalization (DNB) to reduce the distribution shift from diverse domains. Our proposed method yield 81.259 mean intersection-over-union (mIoU) in validation set. This paper studies the effectiveness of employing real-world and synthetic data to handle the domain adaptation in semantic segmentation problem.

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

TL;DR

(

) on the challenge dataset, demonstrating effective cross-domain generalization. The study highlights the value of combining high-resolution feature networks with context-aware refinements and domain-specific normalization for multi-domain semantic segmentation in autonomous driving contexts, with potential improvements from SegFix in future work.

Abstract

Paper Structure (9 sections, 1 equation, 9 figures)

This paper contains 9 sections, 1 equation, 9 figures.

Introduction
Related Work
Methodology
High-Resolution Representations Network (HRNet)
Object-Contextual Representations
Hierarchical Multi-scale Attention Model (HMA)
Domain-Based Batch Normalization
Experiment Setup and Result
Conclusion

Figures (9)

Figure 1: Illustration of the proposed Domain-based formulation, motivated by zhuang2020rethinking. The data from different distribution is standardized in training process with DBN to generate a robust generalized feature. This figure is best viewed in color
Figure 2: Visualization of failures for semantic segmentation relating to inference scale from tao2020hierarchical. In the first row, the thin posts are performed well on low resolution image (0.5x) but better predicted in high resolution image (2.0x). In the second row, the large object (road) is worse predicted at higher resolution (2.0x)
Figure 3: Method Architecture. The DBN is implemented by replacing BN layers inside all blocks with DBN layers. The detail of blocks are shown in: figure \ref{['fig:hrnet']} - HRNet block, figure \ref{['fig:ocr2']} - OCR block, figure \ref{['fig:hma2']} - HMA block.
Figure 4: Visualization of High-resolution network architecture from sun2019high. There are four stages. The 1st stage consists of high-resolution convolutions. The 2nd (3rd, 4th) stage repeats two-resolution (three-resolution, four-resolution) blocks.
Figure 5: Visualization of generating final representation in HRNet from sun2019high. After the low-resolution feature is rescaled, all feature is concatenated to generate the final feature.
...and 4 more figures

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

TL;DR

Abstract

Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images

Authors

TL;DR

Abstract

Table of Contents

Figures (9)