Table of Contents
Fetching ...

oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving

Abdul Hannan Khan, Syed Tahseen Raza Rizvi, Dheeraj Varma Chittari Macharavtu, Andreas Dengel

TL;DR

This work argues that time-to-contact (TTC) is a more robust motion cue for autonomous driving than depth or velocity alone, and that per-pixel TTC estimation is inefficient when paired with object detection. It introduces oTTC, an object-centric TTC predictor that extends 2D object detectors with a TTC attribute branch, predicting motion-in-depth per object from a single image and computing TTC thereafter. Ground-truth TTC is generated from existing object tracks and depth cues across KITTI, NuScenes, and Shift datasets, enabling comprehensive benchmarking against pixel-wise TTC baselines and monocular 3D detectors. The results show that oTTC achieves higher MiD-based TTC accuracy than state-of-the-art approaches, with favorable binary risk performance on real datasets and strong qualitative explanations, highlighting motion blur as a useful cue. Overall, oTTC offers a computationally efficient, interpretable, single-image solution for per-object motion risk in autonomous driving, with practical implications for perception and planning pipelines.

Abstract

Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.

oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving

TL;DR

This work argues that time-to-contact (TTC) is a more robust motion cue for autonomous driving than depth or velocity alone, and that per-pixel TTC estimation is inefficient when paired with object detection. It introduces oTTC, an object-centric TTC predictor that extends 2D object detectors with a TTC attribute branch, predicting motion-in-depth per object from a single image and computing TTC thereafter. Ground-truth TTC is generated from existing object tracks and depth cues across KITTI, NuScenes, and Shift datasets, enabling comprehensive benchmarking against pixel-wise TTC baselines and monocular 3D detectors. The results show that oTTC achieves higher MiD-based TTC accuracy than state-of-the-art approaches, with favorable binary risk performance on real datasets and strong qualitative explanations, highlighting motion blur as a useful cue. Overall, oTTC offers a computationally efficient, interpretable, single-image solution for per-object motion risk in autonomous driving, with practical implications for perception and planning pipelines.

Abstract

Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
Paper Structure (23 sections, 8 equations, 4 figures, 4 tables)

This paper contains 23 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An autonomous driving pipeline including perception, prediction and planning submodules; inspired by planning oriented autonomous driving hu2023planning.
  • Figure 2: Shows monocular depth predictions (b), per-pixel time-to-contact predictions (c) and oTTC predictions (d). (b) The depth prediction shows how far the objects are from the camera, with hotter meaning closer to the camera. (c) The per-pixel TTC predictions show the relative motion of pixels from ego perspective, where temperature indicates how fast objects are moving toward the ego vehicle. The depth prediction (b) is based on PixelFormer agarwal2023attention and per-pixel time-to-contact prediction (c) is based on Binary TTC badki2021binary. (d) is the output of our proposed oTTC model, where normal temperature shows background while hot and cold show objects moving towards and away, respectively, from ego vehicle.
  • Figure 3: Simplified architecture of our oTTC model. It uses HRNetW32 wang2020deep backbone to extract feature. Some details have been omitted for better visibility and to convey general idea of the architecture.
  • Figure 4: Shows qualitative comparison and explanations for oTTC predictions across a range of scenarios. The integrated gradients sundararajan2017axiomatic method was used to generate explanations. The input and output images are cropped for better visibility. The blue outline in GT and oTTC indicates the cars, while the yellow outline indicates the pedestrians.