Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices
Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng
TL;DR
This work tackles the real-time stereo matching problem on edge devices by addressing the speed–accuracy trade-off with a Distill-Then-Prune framework. It presents a lightweight, implementation-friendly network that replaces 3D convolutions and iterative cost-volume construction with a channel-to-disparity approach, and augments it with knowledge distillation from a strong teacher and structured pruning (DepGraph) to achieve a compact, accurate model. Through extensive ablations on SceneFlow and KITTI, the authors demonstrate that teacher-only, L1-based distillation yields superior supervision, and that Setting3 of their module design provides the best efficiency–accuracy balance. The resulting DTPnet attains competitive or state-of-the-art performance among lightweight stereo methods while delivering real-time latency on edge platforms, with qualitative results showing robust disparity in challenging scenes. This framework is versatile and can be applied to existing stereo architectures, enabling practical deployment in robotics and autonomous systems.
Abstract
In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.
