Table of Contents
Fetching ...

The Fourth Monocular Depth Estimation Challenge

Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripudaman Singh Arora, Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma, Weijie Chen, Baobei Xu, Fengyu Sun, Di Xie, Jiang Zhu, Mykola Lavreniuk, Haining Guan, Qun Wu, Yupei Zeng, Chao Lu, Huanran Wang, Guangyuan Zhou, Haotian Zhang, Jianxiong Wang, Qiang Rao, Chunjie Wang, Xiao Liu, Zhiqiang Lou, Hualie Jiang, Yihao Chen, Rui Xu, Minglang Tan, Zihan Qin, Yifan Mao, Jiayang Liu, Jialei Xu, Yifan Yang, Wenbo Zhao, Junjun Jiang, Xianming Liu, Mingshuai Zhao, Anlong Ming, Wu Chen, Feng Xue, Mengying Yu, Shida Gao, Xiangfeng Wang, Gbenga Omotara, Ramy Farag, Jacket Demby, Seyed Mohamad Ali Tousi, Guilherme N DeSouza, Tuan-Anh Yang, Minh-Quang Nguyen, Thien-Phuc Tran, Albert Luginov, Muhammad Shahzad

TL;DR

The paper reports the fourth Monocular Depth Estimation Challenge (MDEC 4), which emphasizes zero-shot generalization to the SYNS-Patches dataset and adopts a least-squares alignment with affine-invariant predictions. It documents the use of contemporary foundation-model-based depth estimators (e.g., Depth Anything v2, Marigold) and analyzes 24 submissions, including 10 with published approaches, yielding a new best 3D F-Score of 23.05% (up from 22.58%). Results show that foundation-model-driven methods produce sharper depth maps and improved indoor/outdoor performance, yet persistent challenges remain at depth discontinuities and non-Lambertian surfaces. The study highlights the growing impact of foundation models on MDE, the value of diverse data, and the need for future work on metric-depth accuracy and alternative scene representations to push beyond incremental gains.

Abstract

This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.

The Fourth Monocular Depth Estimation Challenge

TL;DR

The paper reports the fourth Monocular Depth Estimation Challenge (MDEC 4), which emphasizes zero-shot generalization to the SYNS-Patches dataset and adopts a least-squares alignment with affine-invariant predictions. It documents the use of contemporary foundation-model-based depth estimators (e.g., Depth Anything v2, Marigold) and analyzes 24 submissions, including 10 with published approaches, yielding a new best 3D F-Score of 23.05% (up from 22.58%). Results show that foundation-model-driven methods produce sharper depth maps and improved indoor/outdoor performance, yet persistent challenges remain at depth discontinuities and non-Lambertian surfaces. The study highlights the growing impact of foundation models on MDE, the value of diverse data, and the need for future work on metric-depth accuracy and alternative scene representations to push beyond incremental gains.

Abstract

This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.

Paper Structure

This paper contains 11 sections.