Table of Contents
Fetching ...

APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model

Yuanjie Lu, Beichen Wang, Zhengqi Wu, Yang Li, Xiaomin Lin, Chengzhi Mao, Xuesu Xiao

TL;DR

This paper proposes Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}), which leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners.

Abstract

Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses parameter tuning but struggles with precise control in constrained spaces. To this end, recent robot learning approaches automate parameter tuning while retaining classical systems' safety, yet still face challenges in generalizing to unseen environments. Recently, Vision-Language-Action (VLA) models have shown promise by leveraging foundation models' scene understanding capabilities, but still struggle with precise control and inference latency in navigation tasks. In this paper, we propose Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}). Unlike traditional VLA models that directly output actions, \textsc{applv} leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners. We develop two training strategies: supervised learning fine-tuning from collected navigation trajectories and reinforcement learning fine-tuning to further optimize navigation performance. We evaluate \textsc{applv} across multiple motion planners on the simulated Benchmark Autonomous Robot Navigation (BARN) dataset and in physical robot experiments. Results demonstrate that \textsc{applv} outperforms existing methods in both navigation performance and generalization to unseen environments.

APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model

TL;DR

This paper proposes Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}), which leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners.

Abstract

Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses parameter tuning but struggles with precise control in constrained spaces. To this end, recent robot learning approaches automate parameter tuning while retaining classical systems' safety, yet still face challenges in generalizing to unseen environments. Recently, Vision-Language-Action (VLA) models have shown promise by leveraging foundation models' scene understanding capabilities, but still struggle with precise control and inference latency in navigation tasks. In this paper, we propose Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}). Unlike traditional VLA models that directly output actions, \textsc{applv} leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners. We develop two training strategies: supervised learning fine-tuning from collected navigation trajectories and reinforcement learning fine-tuning to further optimize navigation performance. We evaluate \textsc{applv} across multiple motion planners on the simulated Benchmark Autonomous Robot Navigation (BARN) dataset and in physical robot experiments. Results demonstrate that \textsc{applv} outperforms existing methods in both navigation performance and generalization to unseen environments.
Paper Structure (28 sections, 1 equation, 5 figures, 2 tables)

This paper contains 28 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: applv instantiates the appl paradigm with a VLA model to dynamically adjust a classical navigation planner's parameters, learned from data collected in simulation.
  • Figure 2: applv Architecture. A VLA model processes current and historical states to predict planner parameters, whichconfigure a classical navigation planner to generate motion control commands.
  • Figure 3: applv Training and Deployment Pipeline. Left: Supervised learning data collection with representation and sampling. Middle: applv architecture with VLA model and classical planner. Right: Reinforcement Learning fine-tuning with TD3. Bottom: Physical deployment in natural cluttered environments.
  • Figure 4: Physical deployment of two BARN Challenge environments
  • Figure 5: Effect of Training Data Size (applv-sl).