Table of Contents
Fetching ...

Few-Shot Task Learning through Inverse Generative Modeling

Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Joshua Tenenbaum, Tianmin Shu, Pulkit Agrawal

TL;DR

This work addresses the challenge of learning new task concepts from a few demonstrations. It introduces Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which trains a large conditional diffusion model on paired trajectories and concepts, then infers new concepts by optimizing latent inputs while keeping the model weights fixed. The approach demonstrates strong compositional generalization, enabling learned concepts to be combined with training concepts and to generalize to new initial states and environments across five domains. The results underscore the method's efficiency and generalization capabilities, offering a principled way to leverage pretrained task priors for rapid concept learning without policy or reward shaping. This has practical implications for rapid adaptation of agents in robotics and other sequential decision-making tasks, where data is scarce but prior demonstrations are abundant.

Abstract

Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative model on a set of basic concepts and their demonstrations. Then, given a few demonstrations of a new concept (such as a new goal or a new action), our method learns the underlying concepts through backpropagation without updating the model weights, thanks to the invertibility of the generative model. We evaluate our method in five domains -- object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving, and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.

Few-Shot Task Learning through Inverse Generative Modeling

TL;DR

This work addresses the challenge of learning new task concepts from a few demonstrations. It introduces Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which trains a large conditional diffusion model on paired trajectories and concepts, then infers new concepts by optimizing latent inputs while keeping the model weights fixed. The approach demonstrates strong compositional generalization, enabling learned concepts to be combined with training concepts and to generalize to new initial states and environments across five domains. The results underscore the method's efficiency and generalization capabilities, offering a principled way to leverage pretrained task priors for rapid concept learning without policy or reward shaping. This has practical implications for rapid adaptation of agents in robotics and other sequential decision-making tasks, where data is scarce but prior demonstrations are abundant.

Abstract

Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative model on a set of basic concepts and their demonstrations. Then, given a few demonstrations of a new concept (such as a new goal or a new action), our method learns the underlying concepts through backpropagation without updating the model weights, thanks to the invertibility of the generative model. We evaluate our method in five domains -- object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving, and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.

Paper Structure

This paper contains 54 sections, 2 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Few-shot concept learning. Given paired task demonstration $\tau$ ( e.g., 'walk') and concept $c$ (a latent representation of the task), we train a generative model $\mathcal{G}_{\theta}$ to generate behavior from a concept. Then, given demonstrations of a new behavior $\Tilde{\tau}$ ( e.g., 'jumping jacks') without its concept label, we aim to learn its concept representation by optimizing concept $\Tilde{c}$ as input to frozen $\mathcal{G}_{\theta}$.
  • Figure 2: Experiment Domains. We extensively evaluate our approach for various domains.
  • Figure 3: Diverse learned concept generation. We generate versions of the new behavior conditioned on the learned concept and (1) new initial states and (2) composed with other concepts.
  • Figure 4: Object rearrangement. Training concepts are single pairwise relations ('A right of/above B'), and new concepts are either compositions of training concepts ('A right of/above B' $\land$ 'B right of/above C') or new relations ('A diagonal to B', 'A, B, C on circle circumference of radius r').
  • Figure 5: Object rearrangement new concept qualitative evaluation. Learning the new concept 'square diagonal to triangle' and composing it with the training concept 'circle right of square'.
  • ...and 11 more figures