S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Qingyuan Yang; Guanzhou Chen; Xiaoliang Tan; Tong Wang; Jiaqi Wang; Xiaodong Zhang

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan, Tong Wang, Jiaqi Wang, Xiaodong Zhang

TL;DR

This work tackles the challenge of jointly estimating disparity and semantic labels for satellite epipolar imagery. It proposes S3Net, a single-branch multitask network that unifies semantic segmentation and stereo matching through Disparity-Classification Spatial Feature Extraction Module (DCSFEM), a 4D cost volume, and Self-Fuse (SFM) and Mutual-Fuse (MFM) modules, culminating in outputs for both disparity and pixel-level semantics. The 4D cost volume $H \times W \times D \times C$ encodes both disparity and semantic cues, and the fused features enable mutual enhancement of both tasks. On the US3D dataset, S3Net improves semantic segmentation mIoU from $61.38$ to $67.39$ and reduces disparity metrics from D1-Error $10.051$ to $9.579$ and EPE $1.439$ to $1.403$, demonstrating stronger cross-task learning and robustness for satellite 3D reconstruction. The approach advances practical applications in remote sensing by providing a compact, integrated framework with potential extensions to multiview stereo and multi-sensor data, backed by code availability.

Abstract

Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

TL;DR

encodes both disparity and semantic cues, and the fused features enable mutual enhancement of both tasks. On the US3D dataset, S3Net improves semantic segmentation mIoU from

and reduces disparity metrics from D1-Error

and EPE

, demonstrating stronger cross-task learning and robustness for satellite 3D reconstruction. The approach advances practical applications in remote sensing by providing a compact, integrated framework with potential extensions to multiview stereo and multi-sensor data, backed by code availability.

Abstract

Paper Structure (15 sections, 3 figures, 3 tables)

This paper contains 15 sections, 3 figures, 3 tables.

Introduction
Methodology
Disparity-Classification Spatial Feature Extraction Module (DCSFEM)
Cost Volume
Self-Fuse Module (SFM)
Mutual-Fuse Module (MFM)
Experiments
Experimental settings
Ablation Study
Comparative Analysis with Other Methods
Compared Methods
Stereo Matching task
Semantic Segmentation task
Conclusion
Acknowledgement

Figures (3)

Figure 1: Framework of the Single-branch Semantic Stereo Network (S$^3$Net).
Figure 2: The comparison of S$^3$Net with other methods in disparity estimation tasks on the US3D dataset.
Figure 3: The comparison of S$^3$Net with other methods in semantic segmentation tasks on the US3D dataset.

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

TL;DR

Abstract

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Authors

TL;DR

Abstract

Table of Contents

Figures (3)