DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

Sabariswaran Mani; Sreyas Venkataraman; Abhranil Chandra; Adyan Rizvi; Yash Sirvi; Soumojit Bhattacharya; Aritra Hazra

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

Sabariswaran Mani, Sreyas Venkataraman, Abhranil Chandra, Adyan Rizvi, Yash Sirvi, Soumojit Bhattacharya, Aritra Hazra

TL;DR

DiffClone tackles data-efficient offline robot learning on the TOTO benchmark by combining selective high-reward trajectory sampling, a MoCo-finetuned ResNet50 visual backbone, and a diffusion-based policy (DDPM) to perform enhanced behaviour cloning for visuomotor tasks. The method generates action sequences conditioned on observations using a CNN-based diffusion policy, enabling multimodal, robust behavior without on-policy exploration. Empirical results show diffusion-BC and MoCo representations outperform traditional BC and offline RL baselines in simulation, while real-robot deployment reveals sensitivity to hyperparameters and latency, indicating a need for latency-aware inference (DDIM) and regularization for sim-to-real transfer. Overall, DiffClone highlights the potential of diffusion priors to better capture complex, multimodal manipulation distributions from offline data, with practical impact on data-efficient robot learning and benchmarked evaluation.

Abstract

Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

TL;DR

Abstract

Paper Structure (20 sections, 8 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 8 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Background and Preliminaries
Visual Encoders
Bootstrap Your Own Latent (BYOL)
Momentum Contrast (MoCo)
Agents for Policy Learning
Imitation Learning via Behaviour Cloning
Visual Imitation via Nearest Neighbors (VINN)
Offline RL Methods
DiffClone: The Proposed Framework
Data Preprocessing
Diffusion Policy for Robot Behaviour
Experiments and Results
Experiments with Visual Representation
Experiments with Agent Policy
...and 5 more sections

Figures (3)

Figure 1: Diffusion Policy: A generative model that takes input the latest $T_o$ observations $O_t$ and predicts $T_a$ subsequent actions $A_t$, at each time step $t$. In the CNN variant, it uses Feature-wise Linear Modulation (FiLM) for conditioning at each convolution layer film. The Transformer-based approach attention passes observation embeddings through a causally masked decoder with multi-head cross-attention.
Figure 2: Schematic Model of our proposed DiffClone Framework
Figure 3: Agent learning policy gradually using DiffClone

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

TL;DR

Abstract

DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)