Table of Contents
Fetching ...

A Probabilistic Model for Skill Acquisition with Switching Latent Feedback Controllers

Juyan Zhang, Dana Kulic, Michael Burke

TL;DR

This paper addresses robust skill acquisition for robotic manipulation by modeling skills as latent switching feedback controllers in a latent space, where latent state $z_t$ and skill index $\delta_t$ govern control. It reinterprets a one-layer network as a latent-space feedback controller and extends MDNs with a probabilistic switching mechanism trained via a novel ELBO that includes a switching-consistency term. The approach yields improved task success, robustness to observation noise, and clearer skill transitions across tasks like Franka Kitchen, FetchPush, and robot handwriting, with demonstrated gains in sample efficiency and interpretability. The work has practical impact for deploying robust, multi-skill policies on real robots in multimodal environments, and opens avenues for nonparametric extensions and latent-dynamics-informed control.

Abstract

Manipulation tasks often consist of subtasks, each representing a distinct skill. Mastering these skills is essential for robots, as it enhances their autonomy, efficiency, adaptability, and ability to work in their environment. Learning from demonstrations allows robots to rapidly acquire new skills without starting from scratch, with demonstrations typically sequencing skills to achieve tasks. Behaviour cloning approaches to learning from demonstration commonly rely on mixture density network output heads to predict robot actions. In this work, we first reinterpret the mixture density network as a library of feedback controllers (or skills) conditioned on latent states. This arises from the observation that a one-layer linear network is functionally equivalent to a classical feedback controller, with network weights corresponding to controller gains. We use this insight to derive a probabilistic graphical model that combines these elements, describing the skill acquisition process as segmentation in a latent space, where each skill policy functions as a feedback control law in this latent space. Our approach significantly improves not only task success rate, but also robustness to observation noise when trained with human demonstrations. Our physical robot experiments further show that the induced robustness improves model deployment on robots.

A Probabilistic Model for Skill Acquisition with Switching Latent Feedback Controllers

TL;DR

This paper addresses robust skill acquisition for robotic manipulation by modeling skills as latent switching feedback controllers in a latent space, where latent state and skill index govern control. It reinterprets a one-layer network as a latent-space feedback controller and extends MDNs with a probabilistic switching mechanism trained via a novel ELBO that includes a switching-consistency term. The approach yields improved task success, robustness to observation noise, and clearer skill transitions across tasks like Franka Kitchen, FetchPush, and robot handwriting, with demonstrated gains in sample efficiency and interpretability. The work has practical impact for deploying robust, multi-skill policies on real robots in multimodal environments, and opens avenues for nonparametric extensions and latent-dynamics-informed control.

Abstract

Manipulation tasks often consist of subtasks, each representing a distinct skill. Mastering these skills is essential for robots, as it enhances their autonomy, efficiency, adaptability, and ability to work in their environment. Learning from demonstrations allows robots to rapidly acquire new skills without starting from scratch, with demonstrations typically sequencing skills to achieve tasks. Behaviour cloning approaches to learning from demonstration commonly rely on mixture density network output heads to predict robot actions. In this work, we first reinterpret the mixture density network as a library of feedback controllers (or skills) conditioned on latent states. This arises from the observation that a one-layer linear network is functionally equivalent to a classical feedback controller, with network weights corresponding to controller gains. We use this insight to derive a probabilistic graphical model that combines these elements, describing the skill acquisition process as segmentation in a latent space, where each skill policy functions as a feedback control law in this latent space. Our approach significantly improves not only task success rate, but also robustness to observation noise when trained with human demonstrations. Our physical robot experiments further show that the induced robustness improves model deployment on robots.

Paper Structure

This paper contains 47 sections, 19 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) Model Structure Overview: We model skills as a set of feedback controllers in latent space. The control signals are predicted using a latent controller given a latent gain, a latent goal and the current latent state. Yellow modules are neural networks, while others denote components in the feedback loop. $\boldsymbol{o}(t)$ is observation at time $t$. The encoder neural network maps $\boldsymbol{o}(t)$ to the latent space $\boldsymbol{z}(t)$. The skill switcher network predicts the skill index that switches on the $\boldsymbol{\delta}$th latent feedback controller with reference point $\boldsymbol{g}[\boldsymbol{\delta}]$ and the gain $\boldsymbol{k}[\boldsymbol{\delta}]$. The robot receives the control signal $\boldsymbol{u}$ for execution; (b) Probabilistic graphical model: Solid arrows indicate the generation process. Red dashed arrows indicate the approximating process. $\boldsymbol{g}_t, \boldsymbol{k}_t$ are not random variables. The blue circles are the observed random variables and the yellow circles are the inferred latent random variables.
  • Figure 2: Simulated Environments for evaluations. (a) Franka Kitchen liu_libero_2023: Multitask environment with demonstrations of manipulating different objects in the Kitchen Scene for realistic and scalable evaluations (b) FetchPush Task towers_gymnasium_2023: The Fetch robot pushes the block to the target position for thorough performance analysis of models.
  • Figure 3: Skill Duration for Each Task: The averaged fraction of time-steps spent consecutively executing a skill during a task. Our model executes skills for shorter durations and relies less on any single skill than the MDN.
  • Figure 4: FetchPush Robustness Curve: The red and blue lines are the Success Rate curve of baselines of optimal skill number, BC and MDN respectively. The purple line is our model of optimal skill number. We report the average success given different noise levels for each model.
  • Figure 5: Robot Writing Task: (a) trajectories predicted by different models in simulation; The first row shows trajectories generated by the MDN; The second row shows the trajectories generated by our model. Positive height corresponds to writing pressure when in contact with the whiteboard, while negative height indicates a pen lift. (b) Physical deployment of different models: The first row shows control with the MDN model; The second row shows latent feedback control with our model. Our model stops appropriately after the letter o is finished, while the MDN keeps drawing the letter.
  • ...and 3 more figures