Table of Contents
Fetching ...

Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation

Nethmi Jayasinghe, Diana Gontero, Spencer T. Brown, Vinod K. Sangwan, Mark C. Hersam, Amit Ranjan Trivedi

TL;DR

The paper tackles post-training faults in robotics by introducing an inference-time cerebellar-inspired residual controller that augments a frozen policy with fast, local corrections, avoiding changes to base parameters. It deploys phase-aligned references, phase-local microzones, dual-timescale plasticity, and a conservative meta-adaptation mechanism to regulate corrective authority and suppress unnecessary intervention, with a consolidation pathway for persistent corrections. Empirical results on MuJoCo locomotion benchmarks show substantial gains under moderate faults (e.g., up to +66% on HalfCheetah-v5 and +53% on Humanoid-v5) and demonstrate nominal performance preservation, safety properties, ablations, and extension to a non-cyclic manipulation task PandaReach-v3. This work bridges adaptive control and deep RL by combining fast, inference-time recovery with offline absorption of fault structure into lightweight adapters for enduring robustness in high-dimensional control tasks.

Abstract

Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to $+66\%$ on \texttt{HalfCheetah-v5} and $+53\%$ on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.

Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation

TL;DR

The paper tackles post-training faults in robotics by introducing an inference-time cerebellar-inspired residual controller that augments a frozen policy with fast, local corrections, avoiding changes to base parameters. It deploys phase-aligned references, phase-local microzones, dual-timescale plasticity, and a conservative meta-adaptation mechanism to regulate corrective authority and suppress unnecessary intervention, with a consolidation pathway for persistent corrections. Empirical results on MuJoCo locomotion benchmarks show substantial gains under moderate faults (e.g., up to +66% on HalfCheetah-v5 and +53% on Humanoid-v5) and demonstrate nominal performance preservation, safety properties, ablations, and extension to a non-cyclic manipulation task PandaReach-v3. This work bridges adaptive control and deep RL by combining fast, inference-time recovery with offline absorption of fault structure into lightweight adapters for enduring robustness in high-dimensional control tasks.

Abstract

Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to on \texttt{HalfCheetah-v5} and on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.
Paper Structure (79 sections, 43 equations, 10 figures, 4 tables)

This paper contains 79 sections, 43 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Overview of the cerebellar-inspired fault recovery framework. A frozen base policy is augmented with an inference-time residual controller that learns local, error-driven corrections under post-training faults, enabling rapid recovery without modifying the base policy. Under persistent faults, residual structure is consolidated into a static adapter to maintain long-term robustness. The framework is evaluated on locomotion tasks (HalfCheetah-v5, Ant-v5, Humanoid-v5) and a non-cyclic manipulation task (PandaReach-v3), demonstrating scalability across control dimensionality and task structure.
  • Figure 2: (a) Inference-time residual adaptation under unanticipated post-deployment faults. A delayed out-of-distribution fault (shaded region) defeats robust training: Robust SAC and SAC+OSI fail, while the proposed method recovers without retraining or exploration. (b) Performance on PandaReach-v3 across fault severities, comparing a frozen baseline, robust training, and the proposed method; inference-time adaptation consistently outperforms both baselines. (c) Soft-gated cerebellar residual under actuator bias. Top: EMA of episodic reward, showing residual suppression under nominal conditions, activation under sustained degradation, and decay after fault removal. Bottom: $\ell_2$ norm of the residual action across joints.
  • Figure 3: Robustness across fault severities on HalfCheetah-v5. Episodic return under actuator, dynamic, and environmental perturbations; error bars denote standard deviation over $30$ rollouts.
  • Figure 4: Relative improvement over a frozen SAC baseline across fault severities on HalfCheetah-v5, comparing inference-time adaptation and policy consolidation under actuator, damping, and friction faults.
  • Figure 5: Reference-trajectory ablations on HalfCheetah-v5 under damping and friction increase. Shown are relative performance improvements for the full method, removal of phase-indexed reference acceleration, fixed phase offsets ($\pm 10\%$, $\pm 15\%$), and time-indexed reference lookup, evaluated across fault severities.
  • ...and 5 more figures