Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
Andrea Conti, Matteo Poggi, Valerio Cambareri, Stefano Mattoccia
TL;DR
Depth on Demand (DoD) targets streaming dense depth by coupling a high-FPS RGB stream with a low-FPS, sparse active depth sensor and decoupling their frame rates via $ au = f_{ m D}/f_{ m RGB}$. The method deploys a three-stage pipeline—Multi-Modal Encoding, Iterative Multi-Modal Integration, and Depth Decoding—leveraging geometry cues, monocular context, and sparse depth updates through epipolar-aware features and iterative fusion to predict dense depth maps aligned to the RGB frames. Across indoor and outdoor benchmarks, DoD outperforms depth completion and traditional MVS baselines, achieving denser reconstructions with lower memory footprints and faster runtimes, and exhibits strong generalization to new datasets (e.g., Waymo). The work demonstrates practical impact for robotics and automotive perception by enabling energy-efficient, high-temporal-density depth sensing suitable for safety-critical applications, while noting moving objects as a remaining challenge and highlighting opportunities for further robustness enhancements.
Abstract
High frame rate and accurate depth estimation plays an important role in several tasks crucial to robotics and automotive perception. To date, this can be achieved through ToF and LiDAR devices for indoor and outdoor applications, respectively. However, their applicability is limited by low frame rate, energy consumption, and spatial sparsity. Depth on Demand (DoD) allows for accurate temporal and spatial depth densification achieved by exploiting a high frame rate RGB sensor coupled with a potentially lower frame rate and sparse active depth sensor. Our proposal jointly enables lower energy consumption and denser shape reconstruction, by significantly reducing the streaming requirements on the depth sensor thanks to its three core stages: i) multi-modal encoding, ii) iterative multi-modal integration, and iii) depth decoding. We present extended evidence assessing the effectiveness of DoD on indoor and outdoor video datasets, covering both environment scanning and automotive perception use cases.
