DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

Pedram Agand; Michael Chang; Mo Chen

DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

Pedram Agand, Michael Chang, Mo Chen

TL;DR

DMODE addresses the problem of monocular object distance estimation without relying on object class information by fusing temporal changes in object size with camera ego-motion. It combines a detector-agnostic framework with a three-frame sequence, ResNet-18-derived latent features, and dual heads that predict Cartesian coordinates $\phi=(x,y,z)$ and distance $d$ under a BerHu training objective. Theoretical analysis generalizes distance estimation to 3D for $q+1$ frames and special cases (e.g., constant velocity), while the network architecture enforces consistency with analytic distance relations and remains robust across detectors (GT, TrackRCNN, EagerMOT) and unseen classes. Empirically, DMODE achieves competitive or superior performance on KITTI MOTS across multi-class scenarios, enabling transferable, low-cost 3D perception for autonomous driving without needing class-specific cues or intrinsic camera calibration. The approach holds promise for broad deployment where detector variability and scale-ambiguous monocular cues make class-aware methods impractical, by leveraging size dynamics and ego-motion to infer 3D structure $d=\\sqrt{x^2+y^2+z^2}$ with minimal supervision.

Abstract

Utilizing a single camera for measuring object distances is a cost-effective alternative to stereo-vision and LiDAR. Although monocular distance estimation has been explored in the literature, most existing techniques rely on object class knowledge to achieve high performance. Without this contextual data, monocular distance estimation becomes more challenging, lacking reference points and object-specific cues. However, these cues can be misleading for objects with wide-range variation or adversarial situations, which is a challenging aspect of object-agnostic distance estimation. In this paper, we propose DMODE, a class-agnostic method for monocular distance estimation that does not require object class knowledge. DMODE estimates an object's distance by fusing its fluctuation in size over time with the camera's motion, making it adaptable to various object detectors and unknown objects, thus addressing these challenges. We evaluate our model on the KITTI MOTS dataset using ground-truth bounding box annotations and outputs from TrackRCNN and EagerMOT. The object's location is determined using the change in bounding box sizes and camera position without measuring the object's detection source or class attributes. Our approach demonstrates superior performance in multi-class object distance detection scenarios compared to conventional methods.

DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

TL;DR

and distance

under a BerHu training objective. Theoretical analysis generalizes distance estimation to 3D for

frames and special cases (e.g., constant velocity), while the network architecture enforces consistency with analytic distance relations and remains robust across detectors (GT, TrackRCNN, EagerMOT) and unseen classes. Empirically, DMODE achieves competitive or superior performance on KITTI MOTS across multi-class scenarios, enabling transferable, low-cost 3D perception for autonomous driving without needing class-specific cues or intrinsic camera calibration. The approach holds promise for broad deployment where detector variability and scale-ambiguous monocular cues make class-aware methods impractical, by leveraging size dynamics and ego-motion to infer 3D structure

with minimal supervision.

Abstract

Paper Structure (18 sections, 16 equations, 3 figures, 3 tables)

This paper contains 18 sections, 16 equations, 3 figures, 3 tables.

INTRODUCTION
Related work
Monocular depth estimation
Monocular 3D object detection
Monocular object distance estimation
Problem statement
Method
Theoretical analysis
Network architecture
Learning rules
Results
Model setup
Comparison
Ablation studies
Dataset robustness testing
...and 3 more sections

Figures (3)

Figure 1: Simplified 1D DMODE: a mathematical viewpoint
Figure 2: DMODE framework: workflow of data and models. Only the blue color elements are trainable. The green dots show stack of data.
Figure 3: Error bar in relation to distances, velocity, and acceleration. The red and black lines respectively represents AbsRel and MRE.

Theorems & Definitions (1)

proof

DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

TL;DR

Abstract

DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)