Table of Contents
Fetching ...

ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

Maheswar Bora, Tushar Anand, Saurabh Atreya, Aritra Mukherjee, Abhijit Das

TL;DR

ViM-Disparity addresses the real-time disparity map generation gap by adopting a Vision Mamba (ViM) architecture that fuses efficient state-space memory with attention-inspired feature processing. It employs a six-layer Mamba encoder, joint left-right feature concatenation, and ViM blocks to enable fast, accurate DMG with reduced memory overhead, complemented by a novel SOMER metric defined as $SOMER = \frac{FPS}{EPE \times \log(M)}$ to jointly assess speed, accuracy, and memory. The approach demonstrates competitive EPE and D1 while delivering superior FPS and memory efficiency across KITTI, Sintel, Sceneflow, and VKITTI2 benchmarks, establishing a practical edge-device deployment pathway. Overall, ViM-Disparity provides both a robust DMG method and a unified benchmarking framework that emphasizes real-time applicability without substantial sacrifices in disparity accuracy.

Abstract

In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model. The code implementation and corresponding models are available at: https://github.com/MBora/ViM-Disparity.

ViM-Disparity: Bridging the Gap of Speed, Accuracy and Memory for Disparity Map Generation

TL;DR

ViM-Disparity addresses the real-time disparity map generation gap by adopting a Vision Mamba (ViM) architecture that fuses efficient state-space memory with attention-inspired feature processing. It employs a six-layer Mamba encoder, joint left-right feature concatenation, and ViM blocks to enable fast, accurate DMG with reduced memory overhead, complemented by a novel SOMER metric defined as to jointly assess speed, accuracy, and memory. The approach demonstrates competitive EPE and D1 while delivering superior FPS and memory efficiency across KITTI, Sintel, Sceneflow, and VKITTI2 benchmarks, establishing a practical edge-device deployment pathway. Overall, ViM-Disparity provides both a robust DMG method and a unified benchmarking framework that emphasizes real-time applicability without substantial sacrifices in disparity accuracy.

Abstract

In this work we propose a Visual Mamba (ViM) based architecture, to dissolve the existing trade-off for real-time and accurate model with low computation overhead for disparity map generation (DMG). Moreover, we proposed a performance measure that can jointly evaluate the inference speed, computation overhead and the accurateness of a DMG model. The code implementation and corresponding models are available at: https://github.com/MBora/ViM-Disparity.

Paper Structure

This paper contains 8 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: (a) Overview of the proposed architecture, (b) demonstrates the unrolling process across six Mamba encoders gu2023mamba (c) presents the internal structure of a Mamba encoders.
  • Figure 2: Different datasets used and their corresponding disparity heat maps. The heat maps are plotted in a rainbow spectrum with red being the lowest (farther away) and blue being the highest (nearer) disparity value.