Table of Contents
Fetching ...

Residual Control for Fast Recovery from Dynamics Shifts

Nethmi Jayasinghe, Diana Gontero, Francesco Migliarba, Spencer T. Brown, Vinod K. Sangwan, Mark C. Hersam, Amit Ranjan Trivedi

TL;DR

Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance.

Abstract

Robotic systems operating in real-world environments inevitably encounter unobserved dynamics shifts during continuous execution, including changes in actuation, mass distribution, or contact conditions. When such shifts occur mid-episode, even locally stabilizing learned policies can experience substantial transient performance degradation. While input-to-state stability guarantees bounded state deviation, it does not ensure rapid restoration of task-level performance. We address inference-time recovery under frozen policy parameters by casting adaptation as constrained disturbance shaping around a nominal stabilizing controller. We propose a stability-aligned residual control architecture in which a reinforcement learning policy trained under nominal dynamics remains fixed at deployment, and adaptation occurs exclusively through a bounded additive residual channel. A Stability Alignment Gate (SAG) regulates corrective authority through magnitude constraints, directional coherence with the nominal action, performance-conditioned activation, and adaptive gain modulation. These mechanisms preserve the nominal closed-loop structure while enabling rapid compensation for unobserved dynamics shifts without retraining or privileged disturbance information. Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance. Recovery time is reduced by \textbf{87\%} on the Go1 quadruped, \textbf{48\%} on the Cassie biped, \textbf{30\%} on the H1 humanoid, and \textbf{20\%} on the Scout wheeled platform on average across evaluated conditions relative to a frozen SAC policy.

Residual Control for Fast Recovery from Dynamics Shifts

TL;DR

Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance.

Abstract

Robotic systems operating in real-world environments inevitably encounter unobserved dynamics shifts during continuous execution, including changes in actuation, mass distribution, or contact conditions. When such shifts occur mid-episode, even locally stabilizing learned policies can experience substantial transient performance degradation. While input-to-state stability guarantees bounded state deviation, it does not ensure rapid restoration of task-level performance. We address inference-time recovery under frozen policy parameters by casting adaptation as constrained disturbance shaping around a nominal stabilizing controller. We propose a stability-aligned residual control architecture in which a reinforcement learning policy trained under nominal dynamics remains fixed at deployment, and adaptation occurs exclusively through a bounded additive residual channel. A Stability Alignment Gate (SAG) regulates corrective authority through magnitude constraints, directional coherence with the nominal action, performance-conditioned activation, and adaptive gain modulation. These mechanisms preserve the nominal closed-loop structure while enabling rapid compensation for unobserved dynamics shifts without retraining or privileged disturbance information. Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance. Recovery time is reduced by \textbf{87\%} on the Go1 quadruped, \textbf{48\%} on the Cassie biped, \textbf{30\%} on the H1 humanoid, and \textbf{20\%} on the Scout wheeled platform on average across evaluated conditions relative to a frozen SAC policy.
Paper Structure (13 sections, 21 equations, 5 figures, 3 tables)

This paper contains 13 sections, 21 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the proposed cerebellar-inspired residual control architecture. A frozen RL policy provides nominal control, while a parallel cerebellar residual controller generates per-joint residual actions via microzone-based pathways. Tracking-error driven learning updates dual-timescale residual heads online for rapid adaptation. A Stability Alignment Gate (SAG) constrains residual magnitude and directional alignment before combining the residual with the nominal policy to produce the final action $a_t$.
  • Figure 2: Performance under mid-episode perturbations on the Go1 quadruped across increasing fault severities. (a) Friction increase: Recovery AUC (↑). (b) Actuator degradation: Steady-State Ratio (↑). (c) Mass increase: Time-to-Recovery (TTR-50, ↓). Averaged over 30 trials per condition. The proposed method achieves faster recovery while preserving competitive steady-state performance across perturbation types.
  • Figure 3: Normalized reward traces following mid-episode mild actuator degradation (scaling factor 0.80) on the Go1 quadruped. The fault is injected at timestep 500.
  • Figure 4: Cassie and H1 evaluation under mid-episode perturbations. (a,b) Cassie under friction and mass increase. (c,d) H1 under friction and mass increase. (e--h) Steady-state stability ratio (SSR; higher is better) versus fault severity. (i--l) Recovery time (TTR-50; lower is better) for the same conditions. The proposed stability-aligned residual controller consistently reduces recovery time while maintaining near-nominal steady-state performance across bipedal and humanoid platforms.
  • Figure 5: Scout platform evaluation under mid-episode perturbations. (a,b) Friction decrease and mass increase scenarios. (c,d) Steady-state stability ratio (SSR; higher is better) versus fault severity. (e,f) Recovery time (TTR-50; lower is better) for the same conditions. The proposed stability-aligned residual controller improves recovery speed while preserving steady-state stability across perturbations.