Table of Contents
Fetching ...

Learning a Thousand Tasks in a Day

Kamil Dreczkowski, Pietro Vitiello, Vitalis Vosylius, Edward Johns

TL;DR

Learning a Thousand Tasks in a Day tackles data-inefficiency in robotic imitation by proposing two priors: trajectory decomposition into alignment and interaction, and retrieval-based generalisation. The authors introduce MT3, a fully retrieval-based decomposition method, and validate it across 3,450 real-world rollouts and a large-scale 1,000-task evaluation with single demonstrations, revealing strong data efficiency and meaningful generalisation, along with limitations of open-loop interaction. In controlled tests, MT3 outperforms monolithic behavioural cloning in the few-shot regime, while decomposition provides rapid early gains that may plateau with abundant data. The work offers practical guidance for scalable robot learning, highlighting when retrieval-based decomposition is advantageous and outlining avenues to address open-loop and perception-based limitations in real-world manipulation.

Abstract

Humans are remarkably efficient at learning tasks from demonstrations, but today's imitation learning methods for robot manipulation often require hundreds or thousands of demonstrations per task. We investigate two fundamental priors for improving learning efficiency: decomposing manipulation trajectories into sequential alignment and interaction phases, and retrieval-based generalisation. Through 3,450 real-world rollouts, we systematically study this decomposition. We compare different design choices for the alignment and interaction phases, and examine generalisation and scaling trends relative to today's dominant paradigm of behavioural cloning with a single-phase monolithic policy. In the few-demonstrations-per-task regime (<10 demonstrations), decomposition achieves an order of magnitude improvement in data efficiency over single-phase learning, with retrieval consistently outperforming behavioural cloning for both alignment and interaction. Building on these insights, we develop Multi-Task Trajectory Transfer (MT3), an imitation learning method based on decomposition and retrieval. MT3 learns everyday manipulation tasks from as little as a single demonstration each, whilst also generalising to novel object instances. This efficiency enables us to teach a robot 1,000 distinct everyday tasks in under 24 hours of human demonstrator time. Through 2,200 additional real-world rollouts, we reveal MT3's capabilities and limitations across different task families. Videos of our experiments can be found on at https://www.robot-learning.uk/learning-1000-tasks.

Learning a Thousand Tasks in a Day

TL;DR

Learning a Thousand Tasks in a Day tackles data-inefficiency in robotic imitation by proposing two priors: trajectory decomposition into alignment and interaction, and retrieval-based generalisation. The authors introduce MT3, a fully retrieval-based decomposition method, and validate it across 3,450 real-world rollouts and a large-scale 1,000-task evaluation with single demonstrations, revealing strong data efficiency and meaningful generalisation, along with limitations of open-loop interaction. In controlled tests, MT3 outperforms monolithic behavioural cloning in the few-shot regime, while decomposition provides rapid early gains that may plateau with abundant data. The work offers practical guidance for scalable robot learning, highlighting when retrieval-based decomposition is advantageous and outlining avenues to address open-loop and perception-based limitations in real-world manipulation.

Abstract

Humans are remarkably efficient at learning tasks from demonstrations, but today's imitation learning methods for robot manipulation often require hundreds or thousands of demonstrations per task. We investigate two fundamental priors for improving learning efficiency: decomposing manipulation trajectories into sequential alignment and interaction phases, and retrieval-based generalisation. Through 3,450 real-world rollouts, we systematically study this decomposition. We compare different design choices for the alignment and interaction phases, and examine generalisation and scaling trends relative to today's dominant paradigm of behavioural cloning with a single-phase monolithic policy. In the few-demonstrations-per-task regime (<10 demonstrations), decomposition achieves an order of magnitude improvement in data efficiency over single-phase learning, with retrieval consistently outperforming behavioural cloning for both alignment and interaction. Building on these insights, we develop Multi-Task Trajectory Transfer (MT3), an imitation learning method based on decomposition and retrieval. MT3 learns everyday manipulation tasks from as little as a single demonstration each, whilst also generalising to novel object instances. This efficiency enables us to teach a robot 1,000 distinct everyday tasks in under 24 hours of human demonstrator time. Through 2,200 additional real-world rollouts, we reveal MT3's capabilities and limitations across different task families. Videos of our experiments can be found on at https://www.robot-learning.uk/learning-1000-tasks.

Paper Structure

This paper contains 47 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Learning a thousand tasks in a day. (A) Illustration of 1,000 tasks taught in less than a day. The arrow represents the passing of time, whereas each image is a frame from a real-world rollout of one of the tasks. (B) Illustration of some information regarding the 1,000 tasks dataset. We provide examples of some objects used and some of the skills evaluated.
  • Figure 2: Trajectory decomposition and overview of policy designs. (A) Trajectories are decomposed into alignment and interaction phases. Monolithic approaches use a single policy for entire trajectories. Decomposition-based approaches use two specialised policies: one for end-effector alignment with target objects, and another for precise manipulations. We explore both BC and retrieval-based methods for each phase of this decomposition. (B) A multi-task policy (purple) processes a segmented point cloud and task description as input and outputs robot actions. This can either be a monolithic policy or the combination of an alignment and an interaction policies. Retrieval-based policies (blue) use a retrieved demonstration as context to guide execution. Behavioural Cloning policies (pink) directly predict actions through a neural network.
  • Figure 3: Micro skills and objects considered in the scaling experiments. (A) The micro skills used to evaluate the methods’ response to scaling the demonstrations per task. We also show the various seen and unseen objects used. (B) The micro skills used to evaluate the methods’ response to scaling the number of tasks. These are in addition to those found in (A). (C) The objects used in the latter experiment.
  • Figure 4: Analysis of dataset size and diversity effects on task performance. (A) Performance comparison across all considered methods, with error bars showing 95% Wilson confidence intervals. For seen and unseen task sets, sample sizes were n=36 and n=24, respectively. (B) Comparison between decomposition-based approaches (aggregated results from Ret-Ret (MT3), Ret-BC, BC-Ret, and BC-BC) and monolithic learning (MT-ACT+), averaged across seen and unseen tasks, with error bars showing 95% Wilson confidence intervals. Sample sizes for each comparison are detailed in Methods subsection “Statistical analysis". Statistical significance was assessed using the two-proportion Z-test. (C) Analysis of alignment and interaction strategies: alignment plots compare BC (BC-BC, BC-Ret) versus retrieval (Ret-BC, Ret-Ret (MT3)) for alignment, whereas interaction plots compare BC (BC-BC, Ret-BC) versus retrieval (BC-Ret, Ret-Ret (MT3)) for interaction. Success rates are shown as a function of dataset size (number of demonstrations per task) and diversity (number of tasks).
  • Figure 5: Example rollouts and scene diversity from the one thousand tasks evaluation. (A) Examples of recorded rollouts from the 1,000 task experiment. (B) Examples of the scene diversity to which MT3 was subject to during evaluation.
  • ...and 4 more figures