iMotion-LLM: Instruction-Conditioned Trajectory Generation
Abdulwahab Felemban, Nussair Hroub, Jian Ding, Eslam Abdelrahman, Xiaoqian Shen, Abduallah Mohamed, Mohamed Elhoseiny
TL;DR
iMotion-LLM presents a novel framework that integrates a large language model with trajectory prediction modules to enable instruction-conditioned trajectory generation for autonomous driving. By introducing two datasets (InstructWaymo and Open-Vocabulary InstructNuPlan) and the Instruction Following Recall (IFR) metric, the approach rigorously evaluates instruction adherence alongside trajectory quality and safety. The method maps scene features into the LLM input space and uses an LLM-grounded conditioning pipeline to produce interpretable execution plans and safety justifications, achieving strong IFR and safety performance while enabling text-guided scenario generation. Ablation studies, comparisons with language-conditioned baselines, and closed-loop, safety-focused evaluations underscore the framework’s potential for offline safety testing, simulation, and robust reasoning about driving behavior under natural language instructions.
Abstract
We introduce iMotion-LLM, a large language model (LLM) integrated with trajectory prediction modules for interactive motion generation. Unlike conventional approaches, it generates feasible, safety-aligned trajectories based on textual instructions, enabling adaptable and context-aware driving behavior. It combines an encoder-decoder multimodal trajectory prediction model with a pre-trained LLM fine-tuned using LoRA, projecting scene features into the LLM input space and mapping special tokens to a trajectory decoder for text-based interaction and interpretable driving. To support this framework, we introduce two datasets: 1) InstructWaymo, an extension of the Waymo Open Motion Dataset with direction-based motion instructions, and 2) Open-Vocabulary InstructNuPlan, which features safety-aligned instruction-caption pairs and corresponding safe trajectory scenarios. Our experiments validate that instruction conditioning enables trajectory generation that follows the intended condition. iMotion-LLM demonstrates strong contextual comprehension, achieving 84% average accuracy in direction feasibility detection and 96% average accuracy in safety evaluation of open-vocabulary instructions. This work lays the foundation for text-guided motion generation in autonomous driving, supporting simulated data generation, model interpretability, and robust safety alignment testing for trajectory generation models. Our code, pre-trained model, and datasets are available at: https://vision-cair.github.io/iMotion-LLM/.
