oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving
Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Dheeraj Varma Chittari Macharavtu, Andreas Dengel
TL;DR
This work argues that time-to-contact (TTC) is a more robust motion cue for autonomous driving than depth or velocity alone, and that per-pixel TTC estimation is inefficient when paired with object detection. It introduces oTTC, an object-centric TTC predictor that extends 2D object detectors with a TTC attribute branch, predicting motion-in-depth per object from a single image and computing TTC thereafter. Ground-truth TTC is generated from existing object tracks and depth cues across KITTI, NuScenes, and Shift datasets, enabling comprehensive benchmarking against pixel-wise TTC baselines and monocular 3D detectors. The results show that oTTC achieves higher MiD-based TTC accuracy than state-of-the-art approaches, with favorable binary risk performance on real datasets and strong qualitative explanations, highlighting motion blur as a useful cue. Overall, oTTC offers a computationally efficient, interpretable, single-image solution for per-object motion risk in autonomous driving, with practical implications for perception and planning pipelines.
Abstract
Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
