Table of Contents
Fetching ...

Learn Structure, Adapt on the Fly: Multi-Scale Residual Learning and Online Adaptation for Aerial Manipulators

Samaksh Ujjawal, Naveen Sudheer Nair, Shivansh Pratap Singh, Rishabh Dev Yadav, Wei Pan, Spandan Roy

Abstract

Autonomous Aerial Manipulators (AAMs) are inherently coupled, nonlinear systems that exhibit nonstationary and multiscale residual dynamics, particularly during manipulator reconfiguration and abrupt payload variations. Conventional analytical dynamic models rely on fixed parametric structures, while static data-driven model assume stationary dynamics and degrade under configuration changes and payload variations. Moreover, existing learning architectures do not explicitly factorize cross-variable coupling and multi-scale temporal effects, conflating instantaneous inertial dynamics with long-horizon regime evolution. We propose a predictive-adaptive framework for real-time residual modeling and compensation in AAMs. The core of this framework is the Factorized Dynamics Transformer (FDT), which treats physical variables as independent tokens. This design enables explicit cross-variable attention while structurally separating short-horizon inertial dependencies from long-horizon aerodynamic effects. To address deployment-time distribution shifts, a Latent Residual Adapter (LRA) performs rapid linear adaptation in the latent space via Recursive Least Squares, preserving the offline nonlinear representation without prohibitive computational overhead. The adapted residual forecast is directly integrated into a residual-compensated adaptive controller. Real-world experiments on an aerial manipulator subjected to unseen payloads demonstrate higher prediction fidelity, accelerated disturbance attenuation, and superior closed-loop tracking precision compared to state-of-the-art learning baselines, all while maintaining strict real-time feasibility.

Learn Structure, Adapt on the Fly: Multi-Scale Residual Learning and Online Adaptation for Aerial Manipulators

Abstract

Autonomous Aerial Manipulators (AAMs) are inherently coupled, nonlinear systems that exhibit nonstationary and multiscale residual dynamics, particularly during manipulator reconfiguration and abrupt payload variations. Conventional analytical dynamic models rely on fixed parametric structures, while static data-driven model assume stationary dynamics and degrade under configuration changes and payload variations. Moreover, existing learning architectures do not explicitly factorize cross-variable coupling and multi-scale temporal effects, conflating instantaneous inertial dynamics with long-horizon regime evolution. We propose a predictive-adaptive framework for real-time residual modeling and compensation in AAMs. The core of this framework is the Factorized Dynamics Transformer (FDT), which treats physical variables as independent tokens. This design enables explicit cross-variable attention while structurally separating short-horizon inertial dependencies from long-horizon aerodynamic effects. To address deployment-time distribution shifts, a Latent Residual Adapter (LRA) performs rapid linear adaptation in the latent space via Recursive Least Squares, preserving the offline nonlinear representation without prohibitive computational overhead. The adapted residual forecast is directly integrated into a residual-compensated adaptive controller. Real-world experiments on an aerial manipulator subjected to unseen payloads demonstrate higher prediction fidelity, accelerated disturbance attenuation, and superior closed-loop tracking precision compared to state-of-the-art learning baselines, all while maintaining strict real-time feasibility.
Paper Structure (27 sections, 23 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 23 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Aerial manipulation platform consisting of a quadrotor and a 2-DOF robotic arm transporting different payloads.
  • Figure 2: The Factorized Dynamics Transformer (FDT) architecture. A dual-stream encoder separates short-horizon inertial coupling (via Self-Attention) from long-horizon aerodynamic memory (via Cross-Attention). A learnable Global Token aggregates both temporal scales into a unified Latent Vector, which is mapped to multi-step residual predictions by an MLP decoder.
  • Figure 3: The framework fuses a physics-aware Factosized Dynamics Transformer (FDT) with an online Latent Residual Adapter (LRA). The FDT (center) utilizes inverted variable embeddings and a global-token bottleneck to efficiently compress long-horizon aerodynamic memory. To handle unmodeled regime shifts (e.g., payload changes), the LRA (bottom-right) performs online Bayesian correction directly on the frozen global token features, injecting a real-time adjustment into the adaptive control law (top-right) to guarantee robust trajectory tracking.
  • Figure 4: Dynamic evaluation scenarios. (Top) Scenario A: An altitude-varying S-shaped pick-and-place maneuver. (Bottom) Scenario B: A continuous Figure-8 trajectory.Unit of coordinates in the figure is in meters.