Table of Contents
Fetching ...

Data Augmentation for Instruction Following Policies via Trajectory Segmentation

Niklas Höpner, Ilaria Tiddi, Herke van Hoof

TL;DR

This work tackles data scarcity for instruction-following policies by leveraging abundant unannotated play trajectories through semi-supervised trajectory segmentation. It introduces Play Segmentation (PS), a probabilistic model trained on short, labelled instruction segments that can generalize to longer trajectories, and contrasts it with video-segmentation baselines like UnLoc and TriDet. Across BabyAI and CALVIN, PS provides data augmentation that improves downstream imitation-learning performance, achieving results rivaling or surpassing using twice as much fully labelled data. The findings highlight the potential of targeted segmentation models to unlock value from unlabelled play data, with implications for scalable training of instruction-conditioned agents in robotics and games.

Abstract

The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to augment a small annotated dataset of instruction-trajectory pairs to improve the performance of an instruction-following policy trained downstream via imitation learning. Assuming little variation in segment length, recent video segmentation methods can effectively extract labelled segments. To address the constraint of segment length, we propose Play Segmentation (PS), a probabilistic model that finds maximum likely segmentations of extended subsegments, while only being trained on individual instruction segments. Our results in a game environment and a simulated robotic gripper setting underscore the importance of segmentation; randomly sampled segments diminish performance, while incorporating labelled segments from PS improves policy performance to the level of a policy trained on twice the amount of labelled data.

Data Augmentation for Instruction Following Policies via Trajectory Segmentation

TL;DR

This work tackles data scarcity for instruction-following policies by leveraging abundant unannotated play trajectories through semi-supervised trajectory segmentation. It introduces Play Segmentation (PS), a probabilistic model trained on short, labelled instruction segments that can generalize to longer trajectories, and contrasts it with video-segmentation baselines like UnLoc and TriDet. Across BabyAI and CALVIN, PS provides data augmentation that improves downstream imitation-learning performance, achieving results rivaling or surpassing using twice as much fully labelled data. The findings highlight the potential of targeted segmentation models to unlock value from unlabelled play data, with implications for scalable training of instruction-conditioned agents in robotics and games.

Abstract

The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to augment a small annotated dataset of instruction-trajectory pairs to improve the performance of an instruction-following policy trained downstream via imitation learning. Assuming little variation in segment length, recent video segmentation methods can effectively extract labelled segments. To address the constraint of segment length, we propose Play Segmentation (PS), a probabilistic model that finds maximum likely segmentations of extended subsegments, while only being trained on individual instruction segments. Our results in a game environment and a simulated robotic gripper setting underscore the importance of segmentation; randomly sampled segments diminish performance, while incorporating labelled segments from PS improves policy performance to the level of a policy trained on twice the amount of labelled data.

Paper Structure

This paper contains 20 sections, 5 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Example of play data, where the play trajectory contains sequences of instructable agent behaviour. The trajectory is represented by the observation sequence of the agent. Parts of the trajectory are labelled with the corresponding instructions and form the annotated dataset. Sampling random segments bears the risk to capture incomplete instructions or multiple ones.
  • Figure 2: Overview of how training samples are generated from a single annotated segment for the different segmentation approaches. Here, $c$ stands for the instruction class of the segment and $bg$ for the background class.
  • Figure 3: Example observations from the CALVIN environment (left) and the BabyAI environment (right).
  • Figure 4: Effect of different amounts of annotated data on policy performance as well as the effect of data augmentation via labelled groundtruth segments and labelled random segment for BabyAI (top) and CALVIN (bottom).
  • Figure 5: Distribution of segment length of the segments present in the groundtruth data (top left) and extracted via Play Segmentation (top right), TriDet (bottom left) and UnLoc (bottom right).
  • ...and 4 more figures