Table of Contents
Fetching ...

On-Road Object Importance Estimation: A New Dataset and A Model with Multi-Fold Top-Down Guidance

Zhixiong Nan, Yilong Chen, Tianfei Zhou, Tao Xiang

TL;DR

This paper proposes the first on-road object importance estimation model that fuses multi-fold top-down guidance factors with bottom-up feature, and outperforms state-of-the-art methods by large margins.

Abstract

This paper addresses the problem of on-road object importance estimation, which utilizes video sequences captured from the driver's perspective as the input. Although this problem is significant for safer and smarter driving systems, the exploration of this problem remains limited. On one hand, publicly-available large-scale datasets are scarce in the community. To address this dilemma, this paper contributes a new large-scale dataset named Traffic Object Importance (TOI). On the other hand, existing methods often only consider either bottom-up feature or single-fold guidance, leading to limitations in handling highly dynamic and diverse traffic scenarios. Different from existing methods, this paper proposes a model that integrates multi-fold top-down guidance with the bottom-up feature. Specifically, three kinds of top-down guidance factors (ie, driver intention, semantic context, and traffic rule) are integrated into our model. These factors are important for object importance estimation, but none of the existing methods simultaneously consider them. To our knowledge, this paper proposes the first on-road object importance estimation model that fuses multi-fold top-down guidance factors with bottom-up feature. Extensive experiments demonstrate that our model outperforms state-of-the-art methods by large margins, achieving 23.1% Average Precision (AP) improvement compared with the recently proposed model (ie, Goal).

On-Road Object Importance Estimation: A New Dataset and A Model with Multi-Fold Top-Down Guidance

TL;DR

This paper proposes the first on-road object importance estimation model that fuses multi-fold top-down guidance factors with bottom-up feature, and outperforms state-of-the-art methods by large margins.

Abstract

This paper addresses the problem of on-road object importance estimation, which utilizes video sequences captured from the driver's perspective as the input. Although this problem is significant for safer and smarter driving systems, the exploration of this problem remains limited. On one hand, publicly-available large-scale datasets are scarce in the community. To address this dilemma, this paper contributes a new large-scale dataset named Traffic Object Importance (TOI). On the other hand, existing methods often only consider either bottom-up feature or single-fold guidance, leading to limitations in handling highly dynamic and diverse traffic scenarios. Different from existing methods, this paper proposes a model that integrates multi-fold top-down guidance with the bottom-up feature. Specifically, three kinds of top-down guidance factors (ie, driver intention, semantic context, and traffic rule) are integrated into our model. These factors are important for object importance estimation, but none of the existing methods simultaneously consider them. To our knowledge, this paper proposes the first on-road object importance estimation model that fuses multi-fold top-down guidance factors with bottom-up feature. Extensive experiments demonstrate that our model outperforms state-of-the-art methods by large margins, achieving 23.1% Average Precision (AP) improvement compared with the recently proposed model (ie, Goal).

Paper Structure

This paper contains 23 sections, 16 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The crucial factors considered by human drivers when estimating on-road object importance.
  • Figure 2: The overview of multi-fold top-down guidance aware model.
  • Figure 3: Visualization of object-lane interaction weighting.
  • Figure 4: Qualitative comparison with baselines (i.e., Goal goal, Ohn-Bar Ohn-Bar, and Zhang Zhang). Red boxes represent important objects and green boxes denote unimportant objects.
  • Figure 5: Failure examples. Top row is GT and bottom row is object importance estimation.