Table of Contents
Fetching ...

Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search

Zhongkai Chen, Yihao Sun, Chao Yan, Han Zhou, Xiaojia Xiang, Jie Jiang

TL;DR

Skypilot tackles hallucination and reproducibility in LLM-based AAV coverage planning by grounding language models with a two-stage approach. Stage 1 uses a diversified MCTS framework with physics-informed rewards to generate high-quality trajectory data; Stage 2 performs full-parameter fine-tuning of Qwen3-4B on 23,000 samples to deliver fast, real-time planning. The method is validated through extensive simulations and indoor/outdoor flights, showing superior coverage efficiency, constraint satisfaction, and scalability relative to baselines, while reducing inference time from multi-second MCTS cycles to a few seconds. This approach enables robust autonomous coverage in complex environments and paves the way for extensions to multi-agent coordination and GPS-denied operation. Overall, Skypilot demonstrates that explicit physical grounding coupled with data-efficient fine-tuning can yield practical, reliable LLM-enabled planning for aerial robotics.

Abstract

Autonomous aerial vehicles (AAVs) have played a pivotal role in coverage operations and search missions. Recent advances in large language models (LLMs) offer promising opportunities to augment AAV intelligence. These advances help address complex challenges like area coverage optimization, dynamic path planning, and adaptive decision-making. However, the absence of physical grounding in LLMs leads to hallucination and reproducibility problems in spatial reasoning and decision-making. To tackle these issues, we present Skypilot, an LLM-enhanced two-stage framework that grounds language models in physical reality by integrating monte carlo tree search (MCTS). In the first stage, we introduce a diversified action space that encompasses generate, regenerate, fine-tune, and evaluate operations, coupled with physics-informed reward functions to ensure trajectory feasibility. In the second stage, we fine-tune Qwen3-4B on 23,000 MCTS-generated samples, achieving substantial inference acceleration while maintaining solution quality. Extensive numerical simulations and real-world flight experiments validate the efficiency and superiority of our proposed approach. Detailed information and experimental results are accessible at https://sky-pilot.top.

Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search

TL;DR

Skypilot tackles hallucination and reproducibility in LLM-based AAV coverage planning by grounding language models with a two-stage approach. Stage 1 uses a diversified MCTS framework with physics-informed rewards to generate high-quality trajectory data; Stage 2 performs full-parameter fine-tuning of Qwen3-4B on 23,000 samples to deliver fast, real-time planning. The method is validated through extensive simulations and indoor/outdoor flights, showing superior coverage efficiency, constraint satisfaction, and scalability relative to baselines, while reducing inference time from multi-second MCTS cycles to a few seconds. This approach enables robust autonomous coverage in complex environments and paves the way for extensions to multi-agent coordination and GPS-denied operation. Overall, Skypilot demonstrates that explicit physical grounding coupled with data-efficient fine-tuning can yield practical, reliable LLM-enabled planning for aerial robotics.

Abstract

Autonomous aerial vehicles (AAVs) have played a pivotal role in coverage operations and search missions. Recent advances in large language models (LLMs) offer promising opportunities to augment AAV intelligence. These advances help address complex challenges like area coverage optimization, dynamic path planning, and adaptive decision-making. However, the absence of physical grounding in LLMs leads to hallucination and reproducibility problems in spatial reasoning and decision-making. To tackle these issues, we present Skypilot, an LLM-enhanced two-stage framework that grounds language models in physical reality by integrating monte carlo tree search (MCTS). In the first stage, we introduce a diversified action space that encompasses generate, regenerate, fine-tune, and evaluate operations, coupled with physics-informed reward functions to ensure trajectory feasibility. In the second stage, we fine-tune Qwen3-4B on 23,000 MCTS-generated samples, achieving substantial inference acceleration while maintaining solution quality. Extensive numerical simulations and real-world flight experiments validate the efficiency and superiority of our proposed approach. Detailed information and experimental results are accessible at https://sky-pilot.top.

Paper Structure

This paper contains 24 sections, 9 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of the AAV coverage search task. Coverage maps from radar-based environmental perception and human instructions are jointly processed by a fine-tuned LLM to guide AAV coverage search operations.
  • Figure 2: Overall framework of Skypilot. The training process consists of two stages. In Stage 1, MCTS-based trajectory generation explores the action space and builds high-quality trajectory datasets. In Stage 2, the Qwen3-4B model undergoes full-parameter fine-tuning to enhance inference efficiency. During deployment, the LLM processes the coverage map $\mathcal{M}_t$ and human instructions $\mathcal{I}_t$ to generate the path plan $\mathcal{P}_t$ for coverage missions.
  • Figure 3: Monte Carlo Tree Search process for LLM-based trajectory generation. The four-phase process includes: (a) Selection using UCT to identify promising nodes, (b) Expansion through four action types, (c) Simulation to assess path quality based on coverage ratio and revisit penalty, and (d) Back-propagation to update node values along the search path.
  • Figure 4: Ablation study results of different LLM-based planners in dense obstacle environments. Metrics include coverage rate (CR), duplicate rate (DR), coverage success index (CSI), and inference latency (IL).
  • Figure 5: Indoor coverage search experiments. (a) Test arena with motion capture system, Crazyflie AAV, and reconfigurable obstacles. (b--c) Two obstacle configurations showing progressive coverage from 0% to 100% with overlaid flight trajectories (light-colored arrows), demonstrating the LLM planner's ability to adapt paths in real-time while maintaining complete coverage and obstacle avoidance. The flight video is available at https://youtu.be/_SEGjsGqFrU.
  • ...and 2 more figures