Table of Contents
Fetching ...

Bench4Merge: A Comprehensive Benchmark for Merging in Realistic Dense Traffic with Micro-Interactive Vehicles

Zhengming Wang, Junli Wang, Pengfei Li, Zhaohan Li, Chunyang Liu, Bo Zhang, Peng Li, Yilun Chen

TL;DR

Bench4Merge addresses the challenge of evaluating motion planning in dense merging by introducing a closed-loop benchmark built from real-world initial scenarios, micro-interactive vehicle models learned from large-scale datasets, and an LLM-based evaluator that scores merging sequences holistically. The framework captures rich micro-level interactions via an attention-based imitation learner and uses scenario classification to ensure diverse test cases. Empirical results show strong alignment with human judgments, reveal limitations of existing methods, and demonstrate that the new evaluation approach provides more nuanced insights than traditional metrics. The work offers an open-source platform to advance development of robust merging strategies and suggests future work toward 3D environment rendering and multimodal end-to-end driving systems.

Abstract

While the capabilities of autonomous driving have advanced rapidly, merging into dense traffic remains a significant challenge, many motion planning methods for this scenario have been proposed but it is hard to evaluate them. Most existing closed-loop simulators rely on rule-based controls for other vehicles, which results in a lack of diversity and randomness, thus failing to accurately assess the motion planning capabilities in highly interactive scenarios. Moreover, traditional evaluation metrics are insufficient for comprehensively evaluating the performance of merging in dense traffic. In response, we proposed a closed-loop evaluation benchmark for assessing motion planning capabilities in merging scenarios. Our approach involves other vehicles trained in large scale datasets with micro-behavioral characteristics that significantly enhance the complexity and diversity. Additionally, we have restructured the evaluation mechanism by leveraging Large Language Models (LLMs) to assess each autonomous vehicle merging onto the main lane. Extensive experiments and test-vehicle deployment have demonstrated the progressiveness of this benchmark. Through this benchmark, we have obtained an evaluation of existing methods and identified common issues. The simulation environment and evaluation process can be accessed at https://github.com/WZM5853/Bench4Merge.

Bench4Merge: A Comprehensive Benchmark for Merging in Realistic Dense Traffic with Micro-Interactive Vehicles

TL;DR

Bench4Merge addresses the challenge of evaluating motion planning in dense merging by introducing a closed-loop benchmark built from real-world initial scenarios, micro-interactive vehicle models learned from large-scale datasets, and an LLM-based evaluator that scores merging sequences holistically. The framework captures rich micro-level interactions via an attention-based imitation learner and uses scenario classification to ensure diverse test cases. Empirical results show strong alignment with human judgments, reveal limitations of existing methods, and demonstrate that the new evaluation approach provides more nuanced insights than traditional metrics. The work offers an open-source platform to advance development of robust merging strategies and suggests future work toward 3D environment rendering and multimodal end-to-end driving systems.

Abstract

While the capabilities of autonomous driving have advanced rapidly, merging into dense traffic remains a significant challenge, many motion planning methods for this scenario have been proposed but it is hard to evaluate them. Most existing closed-loop simulators rely on rule-based controls for other vehicles, which results in a lack of diversity and randomness, thus failing to accurately assess the motion planning capabilities in highly interactive scenarios. Moreover, traditional evaluation metrics are insufficient for comprehensively evaluating the performance of merging in dense traffic. In response, we proposed a closed-loop evaluation benchmark for assessing motion planning capabilities in merging scenarios. Our approach involves other vehicles trained in large scale datasets with micro-behavioral characteristics that significantly enhance the complexity and diversity. Additionally, we have restructured the evaluation mechanism by leveraging Large Language Models (LLMs) to assess each autonomous vehicle merging onto the main lane. Extensive experiments and test-vehicle deployment have demonstrated the progressiveness of this benchmark. Through this benchmark, we have obtained an evaluation of existing methods and identified common issues. The simulation environment and evaluation process can be accessed at https://github.com/WZM5853/Bench4Merge.

Paper Structure

This paper contains 12 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A closed-loop benchmark includes three components, respectively, Generation, Update, Evaluation. “Real-Initialization” indicates that the initial scenario is extracted from real data; “Style” represents different vehicle styles; “X Interaction” and “Y Interaction” represent the vehicle's lateral and longitudinal displacements, respectively; “Safe.”, “Effi.”, and “Comf.” denote the three evaluation metrics of safety, efficiency, and comfort; and “Mode” signifies the vehicle's mode, including hurry, medium and relax.
  • Figure 2: Overview of our architecture. Bench4Merge consists of three main parts: Scenario-level Generation, Micro-Controllable Model for main-road vehicles, and LLM-Based Evaluation. In this context, Merging Policy refers to the planning method of the merging vehicle being evaluated.
  • Figure 3: The average speed and average distance distribution of other vehicles in each init environment, we divide them into three categories.
  • Figure 4: Analysis of the datasets
  • Figure 5: Every sample includes the state of the target vehicle and other vehicles, as well as map information. The other vehicles are all those fall within the leading and interaction range of the target vehicle.
  • ...and 3 more figures