Table of Contents
Fetching ...

To the Noise and Back: Diffusion for Shared Autonomy

Takuma Yoneda, Luzhe Sun, Ge Yang, Bradly Stadie, Matthew Walter

TL;DR

The paper tackles robust shared autonomy in unstructured domains without relying on known dynamics, discrete goal spaces, or reward signals. It proposes a diffusion-based copilot that learns a distribution over desired behaviors from demonstrations and uses partial forward and reverse diffusion, governed by the forward diffusion ratio $\boldsymbol{\gamma}$, to translate user actions into samples that balance user fidelity with conformity to safe, effective behavior. Key contributions include a state-conditioned diffusion model trained with a DDPM-like loss, a distribution-transformation mechanism for action editing, and extensive evaluations across four continuous-control tasks plus real-human and real-robot experiments that illustrate improved performance and preserved user autonomy. The results demonstrate the practical impact of reward-free, policy-free assistance that adapts to diverse pilots and tasks, enabling safer and more capable human-robot collaboration.

Abstract

Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.

To the Noise and Back: Diffusion for Shared Autonomy

TL;DR

The paper tackles robust shared autonomy in unstructured domains without relying on known dynamics, discrete goal spaces, or reward signals. It proposes a diffusion-based copilot that learns a distribution over desired behaviors from demonstrations and uses partial forward and reverse diffusion, governed by the forward diffusion ratio , to translate user actions into samples that balance user fidelity with conformity to safe, effective behavior. Key contributions include a state-conditioned diffusion model trained with a DDPM-like loss, a distribution-transformation mechanism for action editing, and extensive evaluations across four continuous-control tasks plus real-human and real-robot experiments that illustrate improved performance and preserved user autonomy. The results demonstrate the practical impact of reward-free, policy-free assistance that adapts to diverse pilots and tasks, enabling safer and more capable human-robot collaboration.

Abstract

Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.
Paper Structure (24 sections, 11 equations, 16 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 11 equations, 16 figures, 6 tables, 1 algorithm.

Figures (16)

  • Figure 1: Our framework utilizes a diffusion model to adapt a user's action (red) to those from a demonstration distribution (green) in a manner (blue) that balances a user's desire to maintain control authority with the benefits (e.g., safety) of conforming to the desired (demonstration) distribution. Without knowledge of the user's specific goal (e.g., the landing location), the demonstration distribution reflects different goals that the demonstration trajectories previously reached.
  • Figure 2: (Top) A visualization of diffusion processes of action distributions at state $s_t$. The black arrow at the top shows the forward diffusion of a source user distribution $P_\textrm{user}$ and the blue arrow below shows the reverse diffusion to the target demonstration distribution $P_\textrm{demo}$. We switch these two processes in the intermediate step $k=k_\textrm{sw}$ to achieve partial forward and reverse diffusion shown in the black and blue arrow on the left. (Bottom) The result of forward and reverse diffusion for different switching times $k_\textrm{sw}$, where standard reverse diffusion process corresponds to $k_\textrm{sw} = K$.
  • Figure 3: We evaluate our algorithm in the context of four shared autonomy environments including a \ref{['fig:maze_env_render']} 2D Control task in which an agent navigates to one of two different goals, \ref{['fig:ll-lander_env_render']} Lunar Lander that tasks a drone with landing at a designated location, \ref{['fig:ll-reacher_env_render']} a Lunar Reacher variant in which the objective is to reach a designated region in the environment, and \ref{['fig:bp_env_render']} Block Pushing, in which the objective is to use a robot arm to push an object into one of two different goal regions.
  • Figure 4: A visualization of the resulting trajectories in the 2D Control environment for different settings for the forward diffusion ratio $\gamma$. The user's objective is to reach the left-hand goal. Without assistance \ref{['fig:maze0']} the user successfully reaches the goal two times, while the eight others timeout. As we increase $\gamma$, we see that \ref{['fig:maze1']}--\ref{['fig:maze4']} the user reaches the desired goal a vast majority of the time. As $\gamma$ gets closer to $1.0$, \ref{['fig:maze6']}\ref{['fig:maze8']} the assisted policy conforms to the expert policy, which avoids timeouts, but without knowledge of the user's goal distributes the trajectories evenly between the left and right goals.
  • Figure 5: Success (higher is better), floating, and crash (lower is better) rates for Lunar Lander (top) and Lunar reacher (bottom) with \ref{['fig:ll-land-assisted-noisy']}\ref{['fig:ll-reach-assisted-noisy']} noisy and \ref{['fig:ll-land-assisted-laggy']}\ref{['fig:ll-reach-assisted-laggy']} laggy pilots. The dashed blue line denotes the success rate of an expert policy, while the dotted blue line denotes the success rate of our model with full-diffusion ($\gamma=1.0$).
  • ...and 11 more figures