Table of Contents
Fetching ...

Stochastic Trajectory Optimization for Robotic Skill Acquisition From a Suboptimal Demonstration

Chenlin Ming, Zitong Wang, Boxuan Zhang, Zhanxiang Cao, Xiaoming Duan, Jianping He

TL;DR

This work addresses acquiring robotic skills from suboptimal demonstrations by marrying imitation with trajectory optimization. It introduces MSTOMP, a multi-policy extension of STOMP, and augments it with two similarity metrics: Dynamic Time Warping (DTW) for time-domain imitation and Mean Square Error in the Spectrum (MSES) for fast, frequency-domain similarity, along with a frequency-domain denoising step. A key contribution is showing how DTW and MSES are connected and leveraging this relationship to achieve stable, robust optimization in high-dimensional spaces. The approach is validated in both PyBullet simulations and real robots, demonstrating superior imitation fidelity and dynamic performance over baselines. The work highlights practical impact for learning from a single, suboptimal demonstration while enabling robust skill acquisition and execution in real-world robotic systems.

Abstract

Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories' differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods.

Stochastic Trajectory Optimization for Robotic Skill Acquisition From a Suboptimal Demonstration

TL;DR

This work addresses acquiring robotic skills from suboptimal demonstrations by marrying imitation with trajectory optimization. It introduces MSTOMP, a multi-policy extension of STOMP, and augments it with two similarity metrics: Dynamic Time Warping (DTW) for time-domain imitation and Mean Square Error in the Spectrum (MSES) for fast, frequency-domain similarity, along with a frequency-domain denoising step. A key contribution is showing how DTW and MSES are connected and leveraging this relationship to achieve stable, robust optimization in high-dimensional spaces. The approach is validated in both PyBullet simulations and real robots, demonstrating superior imitation fidelity and dynamic performance over baselines. The work highlights practical impact for learning from a single, suboptimal demonstration while enabling robust skill acquisition and execution in real-world robotic systems.

Abstract

Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories' differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods.
Paper Structure (12 sections, 3 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 12 sections, 3 theorems, 9 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Given $\bm{x} \in \mathbb{R}^{M \times N}$ and its DFT $\bm{X} \in \mathbb{C}^{M \times N}$, we have:

Figures (6)

  • Figure 1: Examples of optimized trajectories using STOMP, MSTOMP, MPPI, and GPMP. All algorithms utilize the DTW as the cost function. The number in each legend label is the value of the decay factor $\gamma$ used in the algorithms.
  • Figure 2: The evolution of the DTW value over $100$ repeated experiments. MSTOMP has a lower variance and finds better trajectories. At the end of the optimization process, STOMP exhibits a standard deviation of 7.6, whereas MSTOMP achieves a significantly lower standard deviation of 1.4.
  • Figure 3: The denoising performance of the filters in the frequency domain on different trajectories. Different types of trajectories are shown: line in (a), symmetric closed graphic in (b), and unclosed complex graphic in (c) and (d).
  • Figure 4: The Panda robotic arm club swings a gold club in the PyBullet environment.
  • Figure 5: In the obstacle avoidance simulation experiment with the Panda robotic arm, the optimized trajectory moves upwards as a whole to avoid collision, while keeping a safe distance from the right obstacle sphere, even if this results in a higher imitation cost.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Lemma 1: Parseval's theorem parseval1806memoire
  • Lemma 2: Parseval's theorem baddour2019discrete
  • Lemma 3