HABIT: Human Action Benchmark for Interactive Traffic in CARLA

Mohan Ramesh; Mark Azer; Fabian B. Flohr

HABIT: Human Action Benchmark for Interactive Traffic in CARLA

Mohan Ramesh, Mark Azer, Fabian B. Flohr

TL;DR

HABIT tackles the realism gap in autonomous driving evaluation by integrating real-world pedestrian motions into CARLA through a SMPL-based motion reconstruction and retargeting pipeline, yielding 4,730 traffic-compatible sequences. The benchmark supports structured scenario generation and introduces AIS-based injury risk and FPBR as safety metrics to capture nuanced planner behavior beyond binary collisions. Evaluations of InterFuser, TransFuser, and BEVDriver reveal planner weaknesses, including over-conservatism, underreaction, and bias that are not exposed by traditional CARLA Leaderboard metrics. The work demonstrates that realistic pedestrian dynamics significantly alter planner performance and provides open-source tools and data to foster reproducible, pedestrian-aware autonomous system research.

Abstract

Current autonomous driving (AD) simulations are critically limited by their inadequate representation of realistic and diverse human behavior, which is essential for ensuring safety and reliability. Existing benchmarks often simplify pedestrian interactions, failing to capture complex, dynamic intentions and varied responses critical for robust system deployment. To overcome this, we introduce HABIT (Human Action Benchmark for Interactive Traffic), a high-fidelity simulation benchmark. HABIT integrates real-world human motion, sourced from mocap and videos, into CARLA (Car Learning to Act, a full autonomous driving simulator) via a modular, extensible, and physically consistent motion retargeting pipeline. From an initial pool of approximately 30,000 retargeted motions, we curate 4,730 traffic-compatible pedestrian motions, standardized in SMPL format for physically consistent trajectories. HABIT seamlessly integrates with CARLA's Leaderboard, enabling automated scenario generation and rigorous agent evaluation. Our safety metrics, including Abbreviated Injury Scale (AIS) and False Positive Braking Rate (FPBR), reveal critical failure modes in state-of-the-art AD agents missed by prior evaluations. Evaluating three state-of-the-art autonomous driving agents, InterFuser, TransFuser, and BEVDriver, demonstrates how HABIT exposes planner weaknesses that remain hidden in scripted simulations. Despite achieving close or equal to zero collisions per kilometer on the CARLA Leaderboard, the autonomous agents perform notably worse on HABIT, with up to 7.43 collisions/km and a 12.94% AIS 3+ injury risk, and they brake unnecessarily in up to 33% of cases. All components are publicly released to support reproducible, pedestrian-aware AI research.

HABIT: Human Action Benchmark for Interactive Traffic in CARLA

TL;DR

Abstract

HABIT: Human Action Benchmark for Interactive Traffic in CARLA

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)