Table of Contents
Fetching ...

A Lightweight Target-Driven Network of Stereo Matching for Inland Waterways

Jing Su, Yiqing Zhou, Yu Zhang, Chao Wang, Yi Wei

TL;DR

A lightweight target-driven stereo matching neural network, named LTNet, is proposed that achieves competitive results, with only 3.7M parameters, and knowledge distillation is utilized to enhance the generalization capability of lightweight models on the USVInland dataset.

Abstract

Stereo matching for inland waterways is one of the key technologies for the autonomous navigation of Unmanned Surface Vehicles (USVs), which involves dividing the stereo images into reference images and target images for pixel-level matching. However, due to the challenges of the inland waterway environment, such as blurred textures, large spatial scales, and computational resource constraints of the USVs platform, the participation of geometric features from the target image is required for efficient target-driven matching. Based on this target-driven concept, we propose a lightweight target-driven stereo matching neural network, named LTNet. Specifically, a lightweight and efficient 4D cost volume, named the Geometry Target Volume (GTV), is designed to fully utilize the geometric information of target features by employing the shifted target features as the filtered feature volume. Subsequently, to address the substantial texture interference and object occlusions present in the waterway environment, a Left-Right Consistency Refinement (LRR) module is proposed. The \text{LRR} utilizes the pixel-level differences in left and right disparities to introduce soft constraints, thereby enhancing the accuracy of predictions during the intermediate stages of the network. Moreover, knowledge distillation is utilized to enhance the generalization capability of lightweight models on the USVInland dataset. Furthermore, a new large-scale benchmark, named Spring, is utilized to validate the applicability of LTNet across various scenarios. In experiments on the aforementioned two datasets, LTNet achieves competitive results, with only 3.7M parameters. The code is available at https://github.com/Open-YiQingZhou/LTNet .

A Lightweight Target-Driven Network of Stereo Matching for Inland Waterways

TL;DR

A lightweight target-driven stereo matching neural network, named LTNet, is proposed that achieves competitive results, with only 3.7M parameters, and knowledge distillation is utilized to enhance the generalization capability of lightweight models on the USVInland dataset.

Abstract

Stereo matching for inland waterways is one of the key technologies for the autonomous navigation of Unmanned Surface Vehicles (USVs), which involves dividing the stereo images into reference images and target images for pixel-level matching. However, due to the challenges of the inland waterway environment, such as blurred textures, large spatial scales, and computational resource constraints of the USVs platform, the participation of geometric features from the target image is required for efficient target-driven matching. Based on this target-driven concept, we propose a lightweight target-driven stereo matching neural network, named LTNet. Specifically, a lightweight and efficient 4D cost volume, named the Geometry Target Volume (GTV), is designed to fully utilize the geometric information of target features by employing the shifted target features as the filtered feature volume. Subsequently, to address the substantial texture interference and object occlusions present in the waterway environment, a Left-Right Consistency Refinement (LRR) module is proposed. The \text{LRR} utilizes the pixel-level differences in left and right disparities to introduce soft constraints, thereby enhancing the accuracy of predictions during the intermediate stages of the network. Moreover, knowledge distillation is utilized to enhance the generalization capability of lightweight models on the USVInland dataset. Furthermore, a new large-scale benchmark, named Spring, is utilized to validate the applicability of LTNet across various scenarios. In experiments on the aforementioned two datasets, LTNet achieves competitive results, with only 3.7M parameters. The code is available at https://github.com/Open-YiQingZhou/LTNet .

Paper Structure

This paper contains 22 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: 1-pixel error rate and Parameters comparison between our LTNet and other stereo networks on the USVInland dataset. The symbol $\xspace$ refers to model trained using distillation. It can be seen that the LTNet can achieve high predictive accuracy while maintaining a small model size.
  • Figure 2: Rectified stereo matching results for 3D perception of inland waterway scenes using our LTNet and the corresponding results obtained through the method of knowledge distillation. The symbol $\xspace$ refers to model trained using distillation. Please note that our research does not focus on the depth of the water surface. In our demonstrations, water areas have been removed using red and navy blue.
  • Figure 3: Architecture overview of the proposed LTNet. A stereo pair is fed into a feature extractor with shared weights, and 4D Geometry Target Volumes (GTVs) are constructed for the left and right branches respectively. These costs are fed into a 3D CNN network with shared weights and intermediate supervision to obtain the aggregated costs for both sides. After regressing to disparity maps respectively, the Left-Right Consistency Refinement (LRR) module is used to perform left-right consistency check and refinement. Finally, the disparity is upsampled to produce the predicted results. For the USVInland dataset, we further enhance the matching performance of lightweight models through knowledge distillation training.
  • Figure 4: Results of disparity estimation for USVInland test image. Note that in our demonstrations, water areas have been removed using red and navy blue.
  • Figure 5: Comparison of knowledge distillation training results on USVInland test images. The symbol $\xspace$ refers to the model trained using distillation. Our teacher labels utilize the output disparity from CroCo-Stereo weinzaepfel2023croco. Note that water areas in the labels and results have been removed using navy blue.
  • ...and 1 more figures