Phase-Amplitude Reduction-Based Imitation Learning

Satoshi Yamamori; Jun Morimoto

Phase-Amplitude Reduction-Based Imitation Learning

Satoshi Yamamori, Jun Morimoto

TL;DR

The paper addresses the challenge of imitating human motions with safe, stable trajectories by moving beyond traditional dynamical movement primitives to a phase–amplitude reduced latent dynamics framework. It encodes demonstrations into a latent space with phase $φ$ and amplitude $r$, whose dynamics are analytically tractable as $\dot{z}=[ω,-λ\odot r]$, enabling both limit-cycle tracking and convergence through transient phases. An encoder–decoder pair learned via variational inference, coupled with interactive feedback between the robot and latent space, reconstructs and follows demonstrated trajectories on simulated and real robots, including human baton waving. The results show improved handling of transient movements, robustness to disturbances, and successful real-world imitation, highlighting the method’s potential for safer and more versatile imitation learning in robotics.

Abstract

In this study, we propose the use of the phase-amplitude reduction method to construct an imitation learning framework. Imitating human movement trajectories is recognized as a promising strategy for generating a range of human-like robot movements. Unlike previous dynamical system-based imitation learning approaches, our proposed method allows the robot not only to imitate a limit cycle trajectory but also to replicate the transient movement from the initial or disturbed state to the limit cycle. Consequently, our method offers a safer imitation learning approach that avoids generating unpredictable motions immediately after disturbances or from a specified initial state. We first validated our proposed method by reconstructing a simple limit-cycle attractor. We then compared the proposed approach with a conventional method on a lemniscate trajectory tracking task with a simulated robot arm. Our findings confirm that our proposed method can more accurately generate transient movements to converge on a target periodic attractor compared to the previous standard approach. Subsequently, we applied our method to a real robot arm to imitate periodic human movements.

Phase-Amplitude Reduction-Based Imitation Learning

TL;DR

and amplitude

, whose dynamics are analytically tractable as

, enabling both limit-cycle tracking and convergence through transient phases. An encoder–decoder pair learned via variational inference, coupled with interactive feedback between the robot and latent space, reconstructs and follows demonstrated trajectories on simulated and real robots, including human baton waving. The results show improved handling of transient movements, robustness to disturbances, and successful real-world imitation, highlighting the method’s potential for safer and more versatile imitation learning in robotics.

Abstract

Paper Structure (25 sections, 18 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 18 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Related Works
Trajectory-based imitation learning
Latent representation in dynamics model
Methods
Phase-amplitude Reduction Latent Dynamics
Phase-Amplitude Reduction
Interactive Feedback
Encoder-Decoder Learning
Learning Algorithm
Probabilistic Models
Scaled Absolute Error Minimization
Structured Neural Network
Experimental Setups
Simple Limit Cycle
...and 10 more sections

Figures (8)

Figure 1: Phase-amplitude reduction-based imitation learning method. Proposed method reconstructs a human trajectory, recorded by an optical motion capture system, using an encoder $h$ and a decoder $\zeta$. Human trajectory is encoded in a latent space that follows the phase-amplitude equation representing a linear system with a frequency term $\omega$ and an exponent term $\lambda_i$, which determines the trajectory amplitude $r_i$ and phase $\phi$. Encoder $h$ projects the robot state to latent space. Feedback connection is provided to regulate the latent variables according to the robot state.
Figure 2: Interactive feedback system. Robot system $G$ and phase-amplitude system $f$ mutually feedback the errors $\Delta x$ and $\Delta z$.
Figure 3: (a) Proposed probabilistic graphical models. Variational distributions $q_1$ and $q_2$ generate the latent variable $\boldsymbol{z}$ from observation $\boldsymbol{x}$. Probabilistic distributions $p_1$, $p_2$, $p_3$, and $p_4$ represent the target probabilistic model. We calculate four KL divergences to compare variational and probabilistic distributions. $\mathrm{KL}[q_1|p_1]$ and $\mathrm{KL}[q_1|p_2]$ represent transient process losses through the time expansion in latent space. $\mathrm{KL}[q_2|p_3]$ and $\mathrm{KL}[q_2|p_3]$ represent the stationary process losses directly derived from the observation space. (b) Network architecture of the encoder $h$ and decoder $\zeta$. Encoder input and decoder output are provided through the two-layer ReLU networks and one linear layer. The state variable $\boldsymbol{x}$ was encoded to the phase variable $\phi$ through using the inverse tangent $\mathrm{atan2}$. The other way around, the phase variable $\phi$ was decoded to the state variable $\boldsymbol{x}$ through using the sinusoidal functions $\sin$ and $\cos$. $\mathrm{KL}[q_1|p_1]$, $\mathrm{KL}[q_1|p_2]$, and $\mathrm{KL}[q_2|p_4]$ were used to stabilize the learning process.
Figure 4: Learning limit cycle attractor. (a) Target limit cycle attractor in 2D state space $\boldsymbol{x}=[x_1, x_2]$. Color bars show the norm of the vector fields: $\dot{\boldsymbol{x}}=F(\boldsymbol{x})$ and a solid line shows the limit cycle. (b) RMSEs with different data sizes. (c) RMSEs with different target dynamics, i.e., different scale parameters $\alpha$. Our proposed methods showed similar approximation performance on the different target dynamics. (d) - (g) Reconstructed vector fields corresponding to different data sizes from 5k to 100k time steps. Red line starts after 3.1 s (= 200 timesteps.) to show the approximated limit cycle. While transient dynamics reconstruction requires more than 10k steps of data, the limit cycle is well reproduced with 5k steps.
Figure 5: Lemniscate tracking tasks: force noise injection Fig. (b, c, f), slow motion (b, d, g), and trajectory reshaping Fig. (b, e, h). The demonstration data was generated in the lemniscate curve with a constant frequency of 0.2 ⁠ Hz for 20 s.
...and 3 more figures

Phase-Amplitude Reduction-Based Imitation Learning

TL;DR

Abstract

Phase-Amplitude Reduction-Based Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)