DensePercept-NCSSD: Vision Mamba towards Real-time Dense Visual Perception with Non-Causal State Space Duality
Tushar Anand, Advik Sinha, Abhijit Das
TL;DR
This work tackles the real-time, high-accuracy estimation of dense optical flow and stereo disparity by introducing DensePercept-NCSSD, a two-branch architecture built on a non-causal Mamba block and a non-causal state-space duality (SSD). By replacing the quadratic attention of transformers with linear, non-causal SSM-based computation and a pyramid-based matching scheme, the model achieves a favorable speed-accuracy-memory balance. The authors provide extensive experiments on optical flow and disparity across KITTI, VKITTI, Sintel, and Sceneflow, reporting state-of-the-art or competitive EPE, D1, FPS, and SOMER metrics while maintaining real-time capabilities. The proposed approach promises practical impact for real-time robotic perception and autonomous systems by delivering unified dense perception with reduced computational overhead. Overall, DensePercept-NCSSD demonstrates that non-causal SSD-based Mamba blocks can bridge speed, accuracy, and memory requirements in joint flow and disparity tasks.
Abstract
In this work, we propose an accurate and real-time optical flow and disparity estimation model by fusing pairwise input images in the proposed non-causal selective state space for dense perception tasks. We propose a non-causal Mamba block-based model that is fast and efficient and aptly manages the constraints present in a real-time applications. Our proposed model reduces inference times while maintaining high accuracy and low GPU usage for optical flow and disparity map generation. The results and analysis, and validation in real-life scenario justify that our proposed model can be used for unified real-time and accurate 3D dense perception estimation tasks. The code, along with the models, can be found at https://github.com/vimstereo/DensePerceptNCSSD
