Table of Contents
Fetching ...

Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

Xingguang Jiang, Xiaofeng Bian, Chenggang Guo

TL;DR

Ghost-Stereo tackles the challenge of achieving real-time, accurate stereo matching by replacing the heavy 3D convolution-based cost-volume aggregation with GhostNet-inspired modules. The Ghost-CVE module injects multi-scale contextual cues into the cost volume, while the Ghost-CVA module uses lightweight Ghost3D bottlenecks to perform efficient cost aggregation. Together with a GhostNet-based UNet feature extractor and top-k softmax disparity regression, the approach delivers competitive accuracy with significantly fewer parameters and faster inference on benchmarks like SceneFlow and KITTI, and shows strong cross-domain generalization. This work demonstrates that carefully designed lightweight context- and geometry-fusion mechanisms can substantially reduce computational load without sacrificing performance in stereo matching.

Abstract

Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing methods suffer from large number of parameters and slow running time due to the sequential use of 3D convolutions. In this paper, we propose Ghost-Stereo, a novel end-to-end stereo matching network. The feature extraction part of the network uses the GhostNet to form a U-shaped structure. The core of Ghost-Stereo is a GhostNet feature-based cost volume enhancement (Ghost-CVE) module and a GhostNet-inspired lightweight cost volume aggregation (Ghost-CVA) module. For the Ghost-CVE part, cost volumes are constructed and fused by the GhostNet-based features to enhance the spatial context awareness. For the Ghost-CVA part, a lightweight 3D convolution bottleneck block based on the GhostNet is proposed to reduce the computational complexity in this module. By combining with the context and geometry fusion module, a classical hourglass-shaped cost volume aggregate structure is constructed. Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks, and shows a better generalization ability.

Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

TL;DR

Ghost-Stereo tackles the challenge of achieving real-time, accurate stereo matching by replacing the heavy 3D convolution-based cost-volume aggregation with GhostNet-inspired modules. The Ghost-CVE module injects multi-scale contextual cues into the cost volume, while the Ghost-CVA module uses lightweight Ghost3D bottlenecks to perform efficient cost aggregation. Together with a GhostNet-based UNet feature extractor and top-k softmax disparity regression, the approach delivers competitive accuracy with significantly fewer parameters and faster inference on benchmarks like SceneFlow and KITTI, and shows strong cross-domain generalization. This work demonstrates that carefully designed lightweight context- and geometry-fusion mechanisms can substantially reduce computational load without sacrificing performance in stereo matching.

Abstract

Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing methods suffer from large number of parameters and slow running time due to the sequential use of 3D convolutions. In this paper, we propose Ghost-Stereo, a novel end-to-end stereo matching network. The feature extraction part of the network uses the GhostNet to form a U-shaped structure. The core of Ghost-Stereo is a GhostNet feature-based cost volume enhancement (Ghost-CVE) module and a GhostNet-inspired lightweight cost volume aggregation (Ghost-CVA) module. For the Ghost-CVE part, cost volumes are constructed and fused by the GhostNet-based features to enhance the spatial context awareness. For the Ghost-CVA part, a lightweight 3D convolution bottleneck block based on the GhostNet is proposed to reduce the computational complexity in this module. By combining with the context and geometry fusion module, a classical hourglass-shaped cost volume aggregate structure is constructed. Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks, and shows a better generalization ability.
Paper Structure (15 sections, 4 equations, 6 figures, 4 tables)

This paper contains 15 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed Ghost-Stereo stereo matching network structure.
  • Figure 2: Ghost3D-Bottleneck. Left: Ghost3D-bottleneck with stride=1; right: Ghost3D-bottleneck with stride=2
  • Figure 3: Comparison between SE module (left) and the proposed SE3D module (right).
  • Figure 4: Comparison between Ghost module (left) and the proposed Ghost3D module (right).
  • Figure 5: Visualization of disparity prediction error map on the KITTI 2015 test set.
  • ...and 1 more figures