Table of Contents
Fetching ...

FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation

Jiaping Xiao, Cheng Wen Tsao, Yuhang Zhang, Mir Feroskhan

TL;DR

This work addresses the challenge of enabling robust global path planning for autonomous drones by leveraging foundation models. It introduces FM-Planner, a three-stage framework that uses LLMs for semantic reasoning and vision-language models for perception, with a vision-augmented LLM (LoRA-finetuned) to produce real-time trajectories. Through a broad benchmarking of eight LLMs and five VLMs in simulation, plus physical UAV experiments, the study finds that LLMs—especially when enhanced with a vision encoder—offer robust spatial reasoning and obstacle awareness, while VLMs alone struggle to produce reliable global plans. The results demonstrate practical feasibility for perception-informed drone navigation and provide guidance for deploying foundation-model-driven autonomous flight in real-world scenarios. Key metric definitions, such as $ESS = \frac{SR}{ACT}$, quantify the trade-off between success rate and completion time in planning, reinforcing the practical value of the proposed approach.

Abstract

Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.

FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation

TL;DR

This work addresses the challenge of enabling robust global path planning for autonomous drones by leveraging foundation models. It introduces FM-Planner, a three-stage framework that uses LLMs for semantic reasoning and vision-language models for perception, with a vision-augmented LLM (LoRA-finetuned) to produce real-time trajectories. Through a broad benchmarking of eight LLMs and five VLMs in simulation, plus physical UAV experiments, the study finds that LLMs—especially when enhanced with a vision encoder—offer robust spatial reasoning and obstacle awareness, while VLMs alone struggle to produce reliable global plans. The results demonstrate practical feasibility for perception-informed drone navigation and provide guidance for deploying foundation-model-driven autonomous flight in real-world scenarios. Key metric definitions, such as , quantify the trade-off between success rate and completion time in planning, reinforcing the practical value of the proposed approach.

Abstract

Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.

Paper Structure

This paper contains 25 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The real-world autonomous flight with a foundation model-guided path planner. The obstacle positions are recognized by a vision encoder and fed into LLM with prompts.
  • Figure 2: The framework of the foundation model guided path planners with prompt and task descriptions. (a) The LLM-guided path planner with purely textual inputs; (b) The VLM-guided path planner with textual and visual inputs. The path processing module remains the same to extract the readable waypoint list.
  • Figure 3: System architecture of the LLM-Vision-guided path planner framework. A user-provided instruction is combined with real-time visual context from the YOLOv8 vision encoder and fed into a fine-tuned LLM. The planner generates trajectories, which are executed by the drone.
  • Figure 4: Input bird's-eye view images for the VLM-guided path planner test. (a) Two obstacles; (b) three obstacles.
  • Figure 5: Paths generated from various VLMs. (a) Mission with two obstacles; (b) Mission with three obstacles. The VLMs rarely generate optimal paths.
  • ...and 3 more figures