S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery
Qingyuan Yang, Guanzhou Chen, Xiaoliang Tan, Tong Wang, Jiaqi Wang, Xiaodong Zhang
TL;DR
This work tackles the challenge of jointly estimating disparity and semantic labels for satellite epipolar imagery. It proposes S3Net, a single-branch multitask network that unifies semantic segmentation and stereo matching through Disparity-Classification Spatial Feature Extraction Module (DCSFEM), a 4D cost volume, and Self-Fuse (SFM) and Mutual-Fuse (MFM) modules, culminating in outputs for both disparity and pixel-level semantics. The 4D cost volume $H \times W \times D \times C$ encodes both disparity and semantic cues, and the fused features enable mutual enhancement of both tasks. On the US3D dataset, S3Net improves semantic segmentation mIoU from $61.38$ to $67.39$ and reduces disparity metrics from D1-Error $10.051$ to $9.579$ and EPE $1.439$ to $1.403$, demonstrating stronger cross-task learning and robustness for satellite 3D reconstruction. The approach advances practical applications in remote sensing by providing a compact, integrated framework with potential extensions to multiview stereo and multi-sensor data, backed by code availability.
Abstract
Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:https://github.com/CVEO/S3Net.
