Potential Field as Scene Affordance for Behavior Change-Based Visual Risk Object Identification
Pang-Yuan Pao, Shu-Wei Lu, Ze-Yan Lu, Yi-Ting Chen
TL;DR
This work tackles Visual-ROI in intelligent driving by addressing spatial inaccuracies, temporal inconsistency, and computational inefficiency in causal inference. It introduces a Bird's Eye View BEV representation with potential fields that encode scene affordance through repulsive forces from infrastructure and dynamic objects and attractive forces toward a target Tp, enabling BEV based causal reasoning and faster inference with $F = F_a + F_r$. The framework PF+BCP comprises four components: BEV semantic segmentation, target point prediction, potential field rendering, and a behavior-change based Visual-ROI predictor, and is thoroughly evaluated with ablations on RiskBench and nuScenes, showing substantial gains in OT-F1, wMOTA, and runtime efficiency. The results demonstrate the practical impact of integrating scene affordance via potential fields for robust hazard identification in real-world driving systems, while also pointing to limitations tied to BEV segmentation quality and fixed force constants that future work can address.
Abstract
We study behavior change-based visual risk object identification (Visual-ROI), a critical framework designed to detect potential hazards for intelligent driving systems. Existing methods often show significant limitations in spatial accuracy and temporal consistency, stemming from an incomplete understanding of scene affordance. For example, these methods frequently misidentify vehicles that do not impact the ego vehicle as risk objects. Furthermore, existing behavior change-based methods are inefficient because they implement causal inference in the perspective image space. We propose a new framework with a Bird's Eye View (BEV) representation to overcome the above challenges. Specifically, we utilize potential fields as scene affordance, involving repulsive forces derived from road infrastructure and traffic participants, along with attractive forces sourced from target destinations. In this work, we compute potential fields by assigning different energy levels according to the semantic labels obtained from BEV semantic segmentation. We conduct thorough experiments and ablation studies, comparing the proposed method with various state-of-the-art algorithms on both synthetic and real-world datasets. Our results show a notable increase in spatial and temporal consistency, with enhancements of 20.3% and 11.6% on the RiskBench dataset, respectively. Additionally, we can improve computational efficiency by 88%. We achieve improvements of 5.4% in spatial accuracy and 7.2% in temporal consistency on the nuScenes dataset.
