ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation
Maheswar Bora, Tushar Anand, Saurabh Atreya, Aritra Mukherjee, Abhijit Das
TL;DR
ViM-Disparity addresses the real-time disparity map generation gap by adopting a Vision Mamba (ViM) architecture that fuses efficient state-space memory with attention-inspired feature processing. It employs a six-layer Mamba encoder, joint left-right feature concatenation, and ViM blocks to enable fast, accurate DMG with reduced memory overhead, complemented by a novel SOMER metric defined as $SOMER = \frac{FPS}{EPE \times \log(M)}$ to jointly assess speed, accuracy, and memory. The approach demonstrates competitive EPE and D1 while delivering superior FPS and memory efficiency across KITTI, Sintel, Sceneflow, and VKITTI2 benchmarks, establishing a practical edge-device deployment pathway. Overall, ViM-Disparity provides both a robust DMG method and a unified benchmarking framework that emphasizes real-time applicability without substantial sacrifices in disparity accuracy.
Abstract
In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model. The code implementation and corresponding models are available at: https://github.com/MBora/ViM-Disparity.
