FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation
Jiaping Xiao, Cheng Wen Tsao, Yuhang Zhang, Mir Feroskhan
TL;DR
This work addresses the challenge of enabling robust global path planning for autonomous drones by leveraging foundation models. It introduces FM-Planner, a three-stage framework that uses LLMs for semantic reasoning and vision-language models for perception, with a vision-augmented LLM (LoRA-finetuned) to produce real-time trajectories. Through a broad benchmarking of eight LLMs and five VLMs in simulation, plus physical UAV experiments, the study finds that LLMs—especially when enhanced with a vision encoder—offer robust spatial reasoning and obstacle awareness, while VLMs alone struggle to produce reliable global plans. The results demonstrate practical feasibility for perception-informed drone navigation and provide guidance for deploying foundation-model-driven autonomous flight in real-world scenarios. Key metric definitions, such as $ESS = \frac{SR}{ACT}$, quantify the trade-off between success rate and completion time in planning, reinforcing the practical value of the proposed approach.
Abstract
Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.
