Table of Contents
Fetching ...

HABIT: Human Action Benchmark for Interactive Traffic in CARLA

Mohan Ramesh, Mark Azer, Fabian B. Flohr

TL;DR

HABIT tackles the realism gap in autonomous driving evaluation by integrating real-world pedestrian motions into CARLA through a SMPL-based motion reconstruction and retargeting pipeline, yielding 4,730 traffic-compatible sequences. The benchmark supports structured scenario generation and introduces AIS-based injury risk and FPBR as safety metrics to capture nuanced planner behavior beyond binary collisions. Evaluations of InterFuser, TransFuser, and BEVDriver reveal planner weaknesses, including over-conservatism, underreaction, and bias that are not exposed by traditional CARLA Leaderboard metrics. The work demonstrates that realistic pedestrian dynamics significantly alter planner performance and provides open-source tools and data to foster reproducible, pedestrian-aware autonomous system research.

Abstract

Current autonomous driving (AD) simulations are critically limited by their inadequate representation of realistic and diverse human behavior, which is essential for ensuring safety and reliability. Existing benchmarks often simplify pedestrian interactions, failing to capture complex, dynamic intentions and varied responses critical for robust system deployment. To overcome this, we introduce HABIT (Human Action Benchmark for Interactive Traffic), a high-fidelity simulation benchmark. HABIT integrates real-world human motion, sourced from mocap and videos, into CARLA (Car Learning to Act, a full autonomous driving simulator) via a modular, extensible, and physically consistent motion retargeting pipeline. From an initial pool of approximately 30,000 retargeted motions, we curate 4,730 traffic-compatible pedestrian motions, standardized in SMPL format for physically consistent trajectories. HABIT seamlessly integrates with CARLA's Leaderboard, enabling automated scenario generation and rigorous agent evaluation. Our safety metrics, including Abbreviated Injury Scale (AIS) and False Positive Braking Rate (FPBR), reveal critical failure modes in state-of-the-art AD agents missed by prior evaluations. Evaluating three state-of-the-art autonomous driving agents, InterFuser, TransFuser, and BEVDriver, demonstrates how HABIT exposes planner weaknesses that remain hidden in scripted simulations. Despite achieving close or equal to zero collisions per kilometer on the CARLA Leaderboard, the autonomous agents perform notably worse on HABIT, with up to 7.43 collisions/km and a 12.94% AIS 3+ injury risk, and they brake unnecessarily in up to 33% of cases. All components are publicly released to support reproducible, pedestrian-aware AI research.

HABIT: Human Action Benchmark for Interactive Traffic in CARLA

TL;DR

HABIT tackles the realism gap in autonomous driving evaluation by integrating real-world pedestrian motions into CARLA through a SMPL-based motion reconstruction and retargeting pipeline, yielding 4,730 traffic-compatible sequences. The benchmark supports structured scenario generation and introduces AIS-based injury risk and FPBR as safety metrics to capture nuanced planner behavior beyond binary collisions. Evaluations of InterFuser, TransFuser, and BEVDriver reveal planner weaknesses, including over-conservatism, underreaction, and bias that are not exposed by traditional CARLA Leaderboard metrics. The work demonstrates that realistic pedestrian dynamics significantly alter planner performance and provides open-source tools and data to foster reproducible, pedestrian-aware autonomous system research.

Abstract

Current autonomous driving (AD) simulations are critically limited by their inadequate representation of realistic and diverse human behavior, which is essential for ensuring safety and reliability. Existing benchmarks often simplify pedestrian interactions, failing to capture complex, dynamic intentions and varied responses critical for robust system deployment. To overcome this, we introduce HABIT (Human Action Benchmark for Interactive Traffic), a high-fidelity simulation benchmark. HABIT integrates real-world human motion, sourced from mocap and videos, into CARLA (Car Learning to Act, a full autonomous driving simulator) via a modular, extensible, and physically consistent motion retargeting pipeline. From an initial pool of approximately 30,000 retargeted motions, we curate 4,730 traffic-compatible pedestrian motions, standardized in SMPL format for physically consistent trajectories. HABIT seamlessly integrates with CARLA's Leaderboard, enabling automated scenario generation and rigorous agent evaluation. Our safety metrics, including Abbreviated Injury Scale (AIS) and False Positive Braking Rate (FPBR), reveal critical failure modes in state-of-the-art AD agents missed by prior evaluations. Evaluating three state-of-the-art autonomous driving agents, InterFuser, TransFuser, and BEVDriver, demonstrates how HABIT exposes planner weaknesses that remain hidden in scripted simulations. Despite achieving close or equal to zero collisions per kilometer on the CARLA Leaderboard, the autonomous agents perform notably worse on HABIT, with up to 7.43 collisions/km and a 12.94% AIS 3+ injury risk, and they brake unnecessarily in up to 33% of cases. All components are publicly released to support reproducible, pedestrian-aware AI research.

Paper Structure

This paper contains 23 sections, 8 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: HABIT Benchmark: Integrating realistic pedestrian behavior into CARLA for high-fidelity autonomous driving evaluation. Left: retargeting real human motion into simulation. Center: interactive traffic scenarios with diverse agents. Right: testing perception, prediction, and planning systems under realistic pedestrian dynamics.
  • Figure 2: Overview of the proposed motion data processing pipeline. The dotted box presents HABIT'S extensibility and scalability using our video based motion extraction.
  • Figure 3: Global root rotation over time: original (solid) vs. reconstructed (dashed).
  • Figure 4: Canonical poses of SMPL 9417684 and CARLA skeletons carla_issue7621. Note the difference in joint definitions, and limb angles
  • Figure 5: Overview of the HABIT benchmark pipeline where retargeted motions are placed and controlled into CARLA scenes.
  • ...and 12 more figures