Bench4Merge: A Comprehensive Benchmark for Merging in Realistic Dense Traffic with Micro-Interactive Vehicles
Zhengming Wang, Junli Wang, Pengfei Li, Zhaohan Li, Chunyang Liu, Bo Zhang, Peng Li, Yilun Chen
TL;DR
Bench4Merge addresses the challenge of evaluating motion planning in dense merging by introducing a closed-loop benchmark built from real-world initial scenarios, micro-interactive vehicle models learned from large-scale datasets, and an LLM-based evaluator that scores merging sequences holistically. The framework captures rich micro-level interactions via an attention-based imitation learner and uses scenario classification to ensure diverse test cases. Empirical results show strong alignment with human judgments, reveal limitations of existing methods, and demonstrate that the new evaluation approach provides more nuanced insights than traditional metrics. The work offers an open-source platform to advance development of robust merging strategies and suggests future work toward 3D environment rendering and multimodal end-to-end driving systems.
Abstract
While the capabilities of autonomous driving have advanced rapidly, merging into dense traffic remains a significant challenge, many motion planning methods for this scenario have been proposed but it is hard to evaluate them. Most existing closed-loop simulators rely on rule-based controls for other vehicles, which results in a lack of diversity and randomness, thus failing to accurately assess the motion planning capabilities in highly interactive scenarios. Moreover, traditional evaluation metrics are insufficient for comprehensively evaluating the performance of merging in dense traffic. In response, we proposed a closed-loop evaluation benchmark for assessing motion planning capabilities in merging scenarios. Our approach involves other vehicles trained in large scale datasets with micro-behavioral characteristics that significantly enhance the complexity and diversity. Additionally, we have restructured the evaluation mechanism by leveraging Large Language Models (LLMs) to assess each autonomous vehicle merging onto the main lane. Extensive experiments and test-vehicle deployment have demonstrated the progressiveness of this benchmark. Through this benchmark, we have obtained an evaluation of existing methods and identified common issues. The simulation environment and evaluation process can be accessed at https://github.com/WZM5853/Bench4Merge.
