Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

Ilseung Park; Changseob Song; Inseung Kang

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

Ilseung Park, Changseob Song, Inseung Kang

TL;DR

A physics-based neuromusculoskeletal learning framework is presented that trains a hip-exoskeleton control policy entirely in simulation, without motion-capture demonstrations, and deploys it on hardware via policy distillation, providing quantitative evidence of sim-to-real transfer without additional hardware tuning.

Abstract

Developing exoskeleton controllers that generalize across diverse locomotor conditions typically requires extensive motion-capture data and biomechanical labeling, limiting scalability beyond instrumented laboratory settings. Here, we present a physics-based neuromusculoskeletal learning framework that trains a hip-exoskeleton control policy entirely in simulation, without motion-capture demonstrations, and deploys it on hardware via policy distillation. A reinforcement learning teacher policy is trained using a muscle-synergy action prior over a wide range of walking speeds and slopes through a two-stage curriculum, enabling direct comparison between assisted and no-exoskeleton conditions. In simulation, exoskeleton assistance reduces mean muscle activation by up to 3.4% and mean positive joint power by up to 7.0% on level ground and ramp ascent, with benefits increasing systematically with walking speed. On hardware, the assistance profiles learned in simulation are preserved across matched speed-slope conditions (r: 0.82, RMSE: 0.03 Nm/kg), providing quantitative evidence of sim-to-real transfer without additional hardware tuning. These results demonstrate that physics-based neuromusculoskeletal simulation can serve as a practical and scalable foundation for exoskeleton controller development, substantially reducing experimental burden during the design phase.

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

TL;DR

Abstract

Paper Structure (21 sections, 4 equations, 10 figures, 1 table)

This paper contains 21 sections, 4 equations, 10 figures, 1 table.

Introduction
Methods
Neuromusculoskeletal Simulation
Human-Exoskeleton Model
Policy Training and Curriculum
Muscle Synergy Action Prior
Action and Observation Space
Domain Randomization
Reward design
Policy Distillation
Robotic Hip Exoskeleton
Human Experiment
Participants
Experimental Protocol
Results
...and 6 more sections

Figures (10)

Figure 1: Simulation-to-real workflow for learning and deploying hip-exoskeleton control policy.Left (Neuromusculoskeletal simulation): a privileged teacher policy is trained in a predictive human simulation across randomized target speeds ($0.7$--$1.5$ m/s) and slopes ($-5^\circ$ to $+5^\circ$) using a two-stage curriculum: Curriculum 1 learns stable locomotion without exoskeleton actuation (no-exoskeleton), then Curriculum 2 introduces bilateral hip exoskeleton actuation (exoskeleton-assisted), enabling the policy to learn assistive torques. The teacher outputs reduced-dimensional actions via a muscle-synergy prior, comprising lower-limb muscle excitations ($e_{\mathrm{syn}}\!\rightarrow\! e_{\mathrm{limb}}$), direct trunk-muscle excitations ($e_{\mathrm{trunk}}$), and raw bilateral hip-exoskeleton torques ($\tau_{\mathrm{raw}}$), which are smoothed by a first-order low-pass filter (LPF) to yield applied torques ($\tau_{\mathrm{filt}}$). Simulation state and reward are fed back to optimize the teacher. Right (Real world): the teacher is distilled into a temporal convolutional network student policy that maps a short history of unilateral thigh inertial measurement unit (IMU) gyroscope signals ($\dot{\theta}_{\mathrm{thigh}}$) to unilateral hip-torque commands for onboard control; the resulting $\tau_{\mathrm{raw}}$ is passed through the onboard LPF and applied to the physical exoskeleton. The dashed red arrows denote the distillation link from simulated rollouts to the deployable IMU-only policy.
Figure 2: Human-exoskeleton musculoskeletal model used in simulation. Frontal and lateral view of the H2190 full-body musculoskeletal model, comprising 21 degrees of freedom and 90 Hill-type musculotendon actuators. Bilateral hip exoskeleton actuators apply flexion-extension torques about the left and right hip joints.
Figure 3: Comparison of thigh gyroscope signals between simulation and hardware during level-ground walking at $1.2$ m/s. Simulation signals were computed as femur angular velocity expressed in the femur local frame; hardware signals were recorded from an IMU mounted on the distal thigh bar of the exoskeleton. The rotational axes were defined with respect to an upright standing posture: $x$ corresponds to the anteroposterior axis, $y$ to the vertical axis, and $z$ to the mediolateral axis. Solid lines show stride-averaged means and shaded regions indicate $\pm 1$ SD across strides. The mediolateral component (gyro $z$) exhibited the strongest agreement ($r=0.55$) and was therefore selected as the sole input modality for student policy training.
Figure 4: Teacher-student agreement in hip torque prediction. Representative time series of right hip assistance torque produced by the privileged teacher policy (black) and the distilled student controller (teal). The student is a temporal convolutional network that predicts hip torque from a short history of femur mediolateral gyroscope measurements expressed in the local segment frame. The example trial was collected on level ground while commanding a sequence of target speeds ($0.7$, $1.1$, and $1.5$ m/s), each maintained for $5$ s. The first $0.95$ s of the trial contains no student output, as the input history window has not yet been fully populated. For this trial, the student closely tracked the teacher output ($R^2=0.93$).
Figure 5: Robotic hip exoskeleton hardware and onboard software architecture.(a) Robotic hip exoskeleton designed to apply bilateral hip flexion-extension torques during locomotion. (b) Onboard software architecture of the machine learning co-processor. Input sensor signals are logged via a dedicated I/O process; a parallel inference process generates real-time hip torque commands using the pre-trained student policy.
...and 5 more figures

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

TL;DR

Abstract

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)