Table of Contents
Fetching ...

Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning

Kiran Doshi, Marco Bagatella, Stelian Coros

TL;DR

This work investigates the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation can be employed for its improvement, and empirically demonstrates that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to certain OOD problem instances.

Abstract

The combination of behavioural cloning and neural networks has driven significant progress in robotic manipulation. As these algorithms may require a large number of demonstrations for each task of interest, they remain fundamentally inefficient in complex scenarios, in which finite datasets can hardly cover the state space. One of the remaining challenges is thus out-of-distribution (OOD) generalisation, i.e. the ability to predict correct actions for states with a low likelihood with respect to the state occupancy induced by the dataset. This issue is aggravated when the system to control is treated as a black-box, ignoring its physical properties. This work highlights widespread properties of robotic manipulation, specifically pose equivariance and locality. We investigate the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation can be employed for its improvement. Through controlled, simulated and real-world experiments, we empirically demonstrate that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to certain OOD problem instances. Code is available at https://github.com/kirandoshi/pst_ood_gen.

Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning

TL;DR

This work investigates the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation can be employed for its improvement, and empirically demonstrates that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to certain OOD problem instances.

Abstract

The combination of behavioural cloning and neural networks has driven significant progress in robotic manipulation. As these algorithms may require a large number of demonstrations for each task of interest, they remain fundamentally inefficient in complex scenarios, in which finite datasets can hardly cover the state space. One of the remaining challenges is thus out-of-distribution (OOD) generalisation, i.e. the ability to predict correct actions for states with a low likelihood with respect to the state occupancy induced by the dataset. This issue is aggravated when the system to control is treated as a black-box, ignoring its physical properties. This work highlights widespread properties of robotic manipulation, specifically pose equivariance and locality. We investigate the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation can be employed for its improvement. Through controlled, simulated and real-world experiments, we empirically demonstrate that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to certain OOD problem instances. Code is available at https://github.com/kirandoshi/pst_ood_gen.

Paper Structure

This paper contains 29 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of the proposed transformations of the behaviour cloning problem space. The base problem space is with a (arbitrary, e.g. at the base of the robot) fixed Cartesian coordinate system $\mathcal{W}$ in which the pose of the end-effector as well as the objects are measured. $\mathcal{T}_1$ transforms the problem space such that the all poses are measured with respect to the moving Cartesian frame of the end-effector $\mathcal{E}$. $\mathcal{T}_2$ projects all values to a $\lambda$-ball centred at the origin of $\mathcal{E}$.
  • Figure 2: Visualisation of the effect of the proposed transformation on the in-distribution manifold $\hat{\mathcal{X}}$ and on the desired manifold $\mathcal{X}^\star$. The aim is that in the transformed space both manifolds are aligned such that OOD states in $\mathcal{P}$ receive supervision signal in $\mathcal{Q}$.
  • Figure 3: Comparison of BC policies trained in the original problem space $\mathcal{P}$, in $\mathcal{T}_1(\mathcal{P})$ and $\mathcal{T}_2(\mathcal{P})$ on in-distribution and OOD initial states. Top row is for policies trained with MLP, bottom row is for policies trained with diffusion policies. The x-axis shows the normalised distance to the in-distribution manifold, where the value at 0 represents the in-distribution performance. The y-axis shows the mean and standard deviation of final rewards across seeds (higher reward is better).
  • Figure 4: Comparison of Baseline $\mathcal{P}$ (left), $\mathcal{T}_1(\mathcal{P})$ (middle) and $\mathcal{T}_2(\mathcal{P})$ (right) for PushT. Plot colour indicates (interpolated) reward per initial object position averaged over seeds. The top row shows the results for an MLP, the bottom row for diffusion policies. The in-distribution manifold lies within the red torus (between the inner and outer circle), the OOD manifold outside of the outer red circle. The plot in all cases visualises the baseline problem space $\mathcal{P}$.
  • Figure 5: A rendering of the three simulation environments considered in our evaluation.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1: OOD generalisation, informal