Table of Contents
Fetching ...

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Yuehao Song, Shaoyu Chen, Hao Gao, Yifan Zhu, Weixiang Yue, Jialv Zou, Bo Jiang, Zihao Lu, Yu Wang, Qian Zhang, Xinggang Wang

Abstract

Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Abstract

Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down guidance and decision-following ability of the system. To address this issue, we propose Senna-2, an advanced VLM-E2E driving policy that explicitly aligns the two systems for consistent decision-making and planning. Our method follows a consistency-oriented three-stage training paradigm. In the first stage, we conduct driving pre-training to achieve preliminary decision-making and planning, with a decision adapter transmitting VLM decisions to E2E policy in the form of implicit embeddings. In the second stage, we align the VLM and the E2E policy in an open-loop setting. In the third stage, we perform closed-loop alignment via bottom-up Hierarchical Reinforcement Learning in 3DGS environments to reinforce the safety and efficiency. Extensive experiments demonstrate that Senna-2 achieves superior dual-system consistency (19.3% F1 score improvement) and significantly enhances driving safety in both open-loop (5.7% FDE reduction) and closed-loop settings (30.6% AF-CR reduction).
Paper Structure (44 sections, 16 equations, 8 figures, 11 tables)

This paper contains 44 sections, 16 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Consistency gap between VLM and the E2E planner. (a) The E2E planner may misalign with the VLM decision (①), e.g., ②wrong direction or ③mismatched speed change. (b) Existing method senna shows scattered speed distributions, inconsistent with speed decisions. (c) With consistency-oriented training, Senna-2 produces more distinct and decision-aligned speed distributions, reflecting improved dual-system consistency and decision-following ability. Relative speed ratio: the ratio between the planned speed at the 3rd second and the initial speed, reflecting the tendency of speed change of the planning trajectory.
  • Figure 2: Overall model architecture of Senna-2. Text and visual inputs are processed by the VLM to produce high-level driving decisions, which are converted by the Decision Adapter into VLM condition embeddings. The E2E planner then fuses the VLM condition with its own E2E features to generate a trajectory consistent with the high-level decisions.
  • Figure 3: Consistency-oriented training recipe. We perform three training stages, including driving pre-training, open-loop alignment, and closed-loop alignment with Hierarchical Reinforced Learning (HRL).
  • Figure 4: Closed-loop speed control in an empty-road scenario. We visualize (a) driving trajectories, (b) speed curves, and (c) mileage curves under different VLM decisions. Our method exhibits strong decision-following ability. The low-level planning follows the high-level decision for speed control. Normal denotes using the VLM-predicted decision, while accelerate, keep and decelerate denotes using the fixed ones during the whole rollout.
  • Figure 5: Closed-loop qualitative comparisons between Senna senna and Senna-2. Senna suffers from (a) planning misalignment in stop scenario and (b) decision misalignment in cut-in scenario. In contrast, Senna-2 maintains both consistent decision-making and accurate trajectory planning across these challenging situations, demonstrating improved safety and reliability in closed-loop driving.
  • ...and 3 more figures