SMAT: Staged Multi-Agent Training for Co-Adaptive Exoskeleton Control

Yifei Yuan; Ghaith Androwis; Xianlian Zhou

SMAT: Staged Multi-Agent Training for Co-Adaptive Exoskeleton Control

Yifei Yuan, Ghaith Androwis, Xianlian Zhou

Abstract

Effective exoskeleton assistance requires co-adaptation: as the device alters joint dynamics, the user reorganizes neuromuscular coordination, creating a non-stationary learning problem. Most learning-based approaches do not explicitly account for the sequential nature of human motor adaptation, leading to training instability and poorly timed assistance. We propose Staged Multi-Agent Training (SMAT), a four-stage curriculum designed to mirror how users naturally acclimate to a wearable device. In SMAT, a musculoskeletal human actor and a bilateral hip exoskeleton actor are trained progressively: the human first learns unassisted gait, then adapts to the added device mass; the exoskeleton subsequently learns a positive assistance pattern against a stabilized human policy, and finally both agents co-adapt with full torque capacity and bidirectional feedback. We implement SMAT in the MyoAssist simulation environment using a 26-muscle lower-limb model and an attached hip exoskeleton. Our musculoskeletal simulations demonstrate that the learned exoskeleton control policy produces an average 10.1% reduction in hip muscle activation relative to the no-assist condition. We validated the learned controller in an offline setting using open-source gait data, then deployed it to a physical hip exoskeleton for treadmill experiments with five subjects. The resulting policy delivers consistent assistance and predominantly positive mechanical power without the need for any explicitly imposed timing shift (mean positive power: 13.6 W at 6 Nm RMS torque to 23.8 W at 9.3 Nm RMS torque, with minimal negative power) consistently across all subjects without subject-specific retraining.

SMAT: Staged Multi-Agent Training for Co-Adaptive Exoskeleton Control

Abstract

Paper Structure (13 sections, 7 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Method
Multi-Agent Reinforcement Learning Framework
Multi-stage Curriculum Training
Hip Exoskeleton Hardware
Experimental Design
Exoskeleton Torque and Power Analysis
Results
Simulation Evaluation of Stable Walking and Effects of Assistance
Verification of Assistance Efficiency and Speed Generalization with Open Source Dataset
Validation with Human Experiment
Discussion
Conclusions

Figures (7)

Figure 1: Overview of the SMAT framework. Two PPO-based actors, a musculoskeletal human actor $\pi_h$ and an exoskeleton actor $\pi_e$, interact with a shared physics simulation through a shared critic $V_\psi$. Human-Only Training (left): only $\pi_h$ updates via imitation learning from a reference gait. Human-Exo Joint Training (center): both actors train simultaneously. Hardware Evaluation (right): the trained exoskeleton policy is deployed via sim-to-real transfer, running on a Raspberry Pi 4B to command bilateral MyActuator X8-25 actuators during treadmill walking trials. In SMAT, Stage 1 resides in the human only training mode; Stage 2 adapts the human actor with added exoskeleton mass; Stage 3 freezes the human actor and only trains the exo actor for positive assistance timing; and finally Stage 4 trains both actors for co-adaptation. Policies trained in each stage are subsequently transferred to the next stage.
Figure 2: Training reward curves and joint kinematics for Stages 1 and 2 (left and right columns, respectively). Top: Episode reward over training steps. Bottom: Mean $\pm$1 s.d. hip, knee, and ankle angle trajectories over the normalized gait cycle; right limb (solid blue), left limb (dashed red). Vertical lines: mean toe-off.
Figure 3: Stage 4 co-adapted profiles. (a) Exoskeleton assistance power reward (left axis, green) and hip muscle activation penalty (right axis, red dashed) over training. (b) Hip angular velocity over the gait cycle (positive: flexion; negative: extension). (c) Normalized exoskeleton torque. (d) Exoskeleton mechanical power. Panels (b--d): bilateral average; shaded region: $\pm$1 SD. Dotted vertical line: mean toe-off.
Figure 4: Right-side hip muscle activation over the gait cycle comparing Stage 2 (no assistive torque, blue) and Stage 4 (maximum assistive torque 25 Nm, orange). Shaded bands: mean $\pm$ 1 s.d. Dashed vertical lines mark mean toe-off for each condition.
Figure 5: Ablation comparing right-hip normalized exo torque over the gait cycle under four conditions: Stage 3 only without Stage 4 co-adaptation (dashed green); Stage 4 only initialized from Stage 2 without Stage 3 pre-training (dashed red, mean across three runs); Stage 4 only without $r_\mathrm{exo}^{(4)}$, initialized from Stage 2 (dash-dot orange); and the full Stage 3 + Stage 4 (SMAT) pipeline (solid blue). Dotted vertical lines mark mean toe-off.
...and 2 more figures

SMAT: Staged Multi-Agent Training for Co-Adaptive Exoskeleton Control

Abstract

SMAT: Staged Multi-Agent Training for Co-Adaptive Exoskeleton Control

Authors

Abstract

Table of Contents

Figures (7)