Learning Monocular Depth from Events via Egomotion Compensation
Haitao Meng, Chonghao Zhong, Sheng Tang, Lian JunJia, Wenwei Lin, Zhenshan Bing, Yi Chang, Gang Chen, Alois Knoll
TL;DR
This work tackles monocular depth estimation from event cameras by introducing a physics-informed framework that uses egomotion compensation to evaluate depth hypotheses. Key innovations include Focus Cost Discrimination (FCD), which quantifies edge-focused focus quality from gradient-based features, and Inter-hypotheses Cost Aggregation (IHCA), which refines depth costs via trend analysis and multi-scale consistency. By modeling a depth-dependent motion warp $\mathcal{M}(d)$ and forming Image of Warped Events $I(\mathcal{M}(d))$, the method produces metric-scale depth without relying on scale-ambiguous supervision. Experiments on MVSEC and EventCitySim show state-of-the-art or competitive performance with robustness to velocity noise, highlighting the practical impact of combining physical motion priors with learned cost aggregation for event-based depth estimation.
Abstract
Event cameras are neuromorphically inspired sensors that sparsely and asynchronously report brightness changes. Their unique characteristics of high temporal resolution, high dynamic range, and low power consumption make them well-suited for addressing challenges in monocular depth estimation (e.g., high-speed or low-lighting conditions). However, current existing methods primarily treat event streams as black-box learning systems without incorporating prior physical principles, thus becoming over-parameterized and failing to fully exploit the rich temporal information inherent in event camera data. To address this limitation, we incorporate physical motion principles to propose an interpretable monocular depth estimation framework, where the likelihood of various depth hypotheses is explicitly determined by the effect of motion compensation. To achieve this, we propose a Focus Cost Discrimination (FCD) module that measures the clarity of edges as an essential indicator of focus level and integrates spatial surroundings to facilitate cost estimation. Furthermore, we analyze the noise patterns within our framework and improve it with the newly introduced Inter-Hypotheses Cost Aggregation (IHCA) module, where the cost volume is refined through cost trend prediction and multi-scale cost consistency constraints. Extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms cutting-edge methods by up to 10\% in terms of the absolute relative error metric, revealing superior performance in predicting accuracy.
