Table of Contents
Fetching ...

PoseDriver: A Unified Approach to Multi-Category Skeleton Detection for Autonomous Driving

Yasamin Borhani, Taylor Mordan, Yihan Wang, Reyhaneh Hosseininejad, Javad Khoramdel, Alexandre Alahi

Abstract

Object skeletons offer a concise representation of structural information, capturing essential aspects of posture and orientation that are crucial for autonomous driving applications. However, a unified architecture that simultaneously handles multiple instances and categories using only the input image remains elusive. In this paper, we introduce PoseDriver, a unified framework for bottom-up multi-category skeleton detection tailored to common objects in driving scenarios. We model each category as a distinct task to systematically address the challenges of multi-task learning. Specifically, we propose a novel approach for lane detection based on skeleton representations, achieving state-of-the-art performance on the OpenLane dataset. Moreover, we present a new dataset for bicycle skeleton detection and assess the transferability of our framework to novel categories. Experimental results validate the effectiveness of the proposed approach.

PoseDriver: A Unified Approach to Multi-Category Skeleton Detection for Autonomous Driving

Abstract

Object skeletons offer a concise representation of structural information, capturing essential aspects of posture and orientation that are crucial for autonomous driving applications. However, a unified architecture that simultaneously handles multiple instances and categories using only the input image remains elusive. In this paper, we introduce PoseDriver, a unified framework for bottom-up multi-category skeleton detection tailored to common objects in driving scenarios. We model each category as a distinct task to systematically address the challenges of multi-task learning. Specifically, we propose a novel approach for lane detection based on skeleton representations, achieving state-of-the-art performance on the OpenLane dataset. Moreover, we present a new dataset for bicycle skeleton detection and assess the transferability of our framework to novel categories. Experimental results validate the effectiveness of the proposed approach.
Paper Structure (23 sections, 1 equation, 7 figures, 8 tables)

This paper contains 23 sections, 1 equation, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Skeleton detection provides a detailed but light representation of the environment for autonomous driving. Our purpose is to jointly detect skeletons on dynamic road users (cars, humans, and animals) as well as static road configuration (lanes) to get a comprehensive representation of the environment around the car, enabling a better understanding and safer navigation.
  • Figure 2: Comparison of skeleton-based lane detection methods' association schemes. (a) FoloLane qu2021focus follows a locally iterative manner. (b) GANet wang2022keypoint regresses each keypoint to its starting point. (c) RCLane xu2022rclane models lanes as relay chains. (d) PoseDriver jointly estimates keypoint location and association using intensity and association fields.
  • Figure 3: Overview of our framework: Our network is designed to detect skeletons for pedestrians, animals, cars, bicycles, and lanes. A feature fusion stage is incorporated after the backbone and task-specific transformer blocks to enhance overall performance. In our schematic diagrams, dashed and dotted paths represent different architectural variations (e.g., configurations without transformer or FPN). The $\oplus$ nodes indicate the merging of two inputs; when only one input is connected to the output, this notation is used solely for visualization purposes to illustrate different variants of our architecture and should not be interpreted as a summation or other arithmetic operation.
  • Figure 4: Qualitative results of our method ($4^{th}$ column) on CULane test set, compared with CLRNet zheng2022clrnet ($3^{rd}$ column). The first row shows an example of a night scene, the second row contains curved lane lines, and the third and fourth rows showcase crowded scenes with severe occlusion. The red arrows point at the parts where our method clearly generates better predictions.
  • Figure 5: Qualitative results of our method on OpenLane validation set, including crowded scenes with occlusion, extreme weather, dazzling sunshine, converging lanes, night-time driving scenarios, and curve-shaped lanes. Missing annotations in the ground truth are marked as red lines in the images.
  • ...and 2 more figures