Table of Contents
Fetching ...

Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

Hazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi

Abstract

Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.

Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

Abstract

Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.

Paper Structure

This paper contains 47 sections, 52 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overhead aerial manipulator (manipulator kinematics).
  • Figure 2: High-level architecture of the proposed Transformer-DDQN framework with adaptive beam search. The UAV manipulator (agent) interacts with the wall-tracking environment through discretized torque actions. The Transformer Q-network estimates $Q_\theta(s, a)$ using self-attention layers and dueling heads $(V_\theta, A_\theta)$, while the target network $Q_{\bar{\theta}}$ provides stable bootstrapped targets for the Double DQN loss. Beam search expands multiple candidate trajectories to improve decision consistency and robustness.
  • Figure 3: Conceptual illustration of beam search in the action tree with B = 2. Orange: visited nodes; Blue: expanded nodes. Beam Search retains only the highest-valued branches for expansion.
  • Figure 4: Illustration of a sample experimental setup. The target trajectory (red) defines the desired end-effector motion, while the UAV base trajectory (blue) is generated based on the target trajectory using a motion planner to maintain a stable flight, achieve a feasible reach, and maintain consistent surface clearance.
  • Figure 5: Model Performance Summary over 1500 episodes. Left: final average reward; middle: tracking efficiency; right: relative improvement over DDQN.
  • ...and 5 more figures