Table of Contents
Fetching ...

Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving

Kanishkha Jaisankar, Sunidhi Tandel

TL;DR

Align2Act tackles the challenge of human-aligned motion planning for autonomous driving by recasting planning as conditional language generation. The approach uses instruction-tuned LLMs, specifically LLaMA-2-7B fine-tuned with LoRA, to jointly generate a trajectory $T$ and a structured reasoning trace $I_c$ from inputs $X_t = \{O, S_t, S_y, I\}$, via $Y_t = \{I_c, T\}$, with decoding governed by $Y_{\log} = F(X_t)$ and top-$p$ sampling. The core contribution, Align2ActChain, decomposes decision-making into four interpretable stages—Preliminary Planning, Collision Prediction, Traffic Context Assessment, and Final Action Integration—providing post-hoc traceability while ensuring safety constraints. Empirical results on the nuPlan benchmark show strong open-loop performance (e.g., OLS of $85.17$ on Test14-random) and competitive closed-loop behavior, with ablations confirming the crucial role of structured reasoning and scenario diversity. While promising for interpretability and alignment, the work notes limitations in latency, real-time robustness, and generalization, outlining future work on visual inputs, latency reduction, and broader benchmarks like val14 to bridge generative reasoning and real-world autonomous control.

Abstract

Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structured driving instructions based on human reasoning patterns (e.g., anticipate hazards, yield at intersections) and traffic rules (e.g., stop at red lights, maintain lane boundaries). Our Align2ActChain module guides step-by-step reasoning to produce both an interpretable rationale and a safe trajectory. By fine-tuning LLaMA-2-7B with LoRA on one million scenarios from the nuPlan dataset, our method achieves an open-loop score of 85.17 and closed-loop scores of 70.31 (non-reactive) and 66.96 (reactive) on Test14-random. Unlike prior work focused on synthetic or open-loop settings, we demonstrate improved planning quality and human-likeness on the real-world nuPlan closed-loop benchmark. Ablation studies confirm that structured reasoning significantly improves performance over baseline LLM planners.

Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving

TL;DR

Align2Act tackles the challenge of human-aligned motion planning for autonomous driving by recasting planning as conditional language generation. The approach uses instruction-tuned LLMs, specifically LLaMA-2-7B fine-tuned with LoRA, to jointly generate a trajectory and a structured reasoning trace from inputs , via , with decoding governed by and top- sampling. The core contribution, Align2ActChain, decomposes decision-making into four interpretable stages—Preliminary Planning, Collision Prediction, Traffic Context Assessment, and Final Action Integration—providing post-hoc traceability while ensuring safety constraints. Empirical results on the nuPlan benchmark show strong open-loop performance (e.g., OLS of on Test14-random) and competitive closed-loop behavior, with ablations confirming the crucial role of structured reasoning and scenario diversity. While promising for interpretability and alignment, the work notes limitations in latency, real-time robustness, and generalization, outlining future work on visual inputs, latency reduction, and broader benchmarks like val14 to bridge generative reasoning and real-world autonomous control.

Abstract

Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structured driving instructions based on human reasoning patterns (e.g., anticipate hazards, yield at intersections) and traffic rules (e.g., stop at red lights, maintain lane boundaries). Our Align2ActChain module guides step-by-step reasoning to produce both an interpretable rationale and a safe trajectory. By fine-tuning LLaMA-2-7B with LoRA on one million scenarios from the nuPlan dataset, our method achieves an open-loop score of 85.17 and closed-loop scores of 70.31 (non-reactive) and 66.96 (reactive) on Test14-random. Unlike prior work focused on synthetic or open-loop settings, we demonstrate improved planning quality and human-likeness on the real-world nuPlan closed-loop benchmark. Ablation studies confirm that structured reasoning significantly improves performance over baseline LLM planners.

Paper Structure

This paper contains 19 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The Align2Act framework: (D)ata and (S)cenario inputs are processed through (R)ules and (L)LM reasoning to generate (M)otion plans via the (IC) Align2ActChain. (P)lanner and (E)xplanation modules ensure interpretability, while (I)nstructions and (G)oal alignment refine human-like behavior.
  • Figure 2: End-to-end project workflow: from data acquisition and simulation setup to training, fine-tuning, and benchmarking with LLM-based planners.
  • Figure 3: Visualization of Align2Act's reasoning process for a lane-following scenario. The model's decision-making chain is shown: (1) Preliminary action selection (blue trajectory), (2) Collision risk assessment (red bounding boxes for critical agents), (3) Traffic rule compliance (green light status and speed monitoring), and (4) Final action execution (bold trajectory with corrective steering). The ego vehicle (yellow) maintains safe distances while adhering to lane boundaries.