Table of Contents
Fetching ...

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

Arnau Boix-Granell, Alberto San-Miguel-Tello, Magí Dalmau-Moreno, Néstor García

TL;DR

Results for a pick-and-place task in a simulated scenario show that proposed method outperforms policies without human feedback, improving robustness on deployment and reducing computational burden.

Abstract

This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on a broad generic task, generated from a set of user-guided demonstrations, can be refined through reinforcement to generate new unseen fine-grain behaviours. The refinement process follows the Eureka paradigm, where reward functions for RL are iteratively generated from an initial natural-language task description. Presented approach, builds on top of this mechanism to adapt a refined IL policy of a generic task to new goal configurations and the introduction of constraints by adding also human feedback correction on intermediate rollouts, enabling policy reusability and therefore data efficiency. Results for a pick-and-place task in a simulated scenario show that proposed method outperforms policies without human feedback, improving robustness on deployment and reducing computational burden.

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

TL;DR

Results for a pick-and-place task in a simulated scenario show that proposed method outperforms policies without human feedback, improving robustness on deployment and reducing computational burden.

Abstract

This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on a broad generic task, generated from a set of user-guided demonstrations, can be refined through reinforcement to generate new unseen fine-grain behaviours. The refinement process follows the Eureka paradigm, where reward functions for RL are iteratively generated from an initial natural-language task description. Presented approach, builds on top of this mechanism to adapt a refined IL policy of a generic task to new goal configurations and the introduction of constraints by adding also human feedback correction on intermediate rollouts, enabling policy reusability and therefore data efficiency. Results for a pick-and-place task in a simulated scenario show that proposed method outperforms policies without human feedback, improving robustness on deployment and reducing computational burden.
Paper Structure (17 sections, 2 equations, 3 figures)

This paper contains 17 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: PRISM pipeline overview. First a (non-expert) user provides demonstrations on a generic task to generate a generic policy from imitation (left). Under a new task defined by the user in natural-language, this policy is iteratively refined using an LLM that generates reward candidates and improves them according to user feedback (right).
  • Figure 2: Execution of PRISM pipeline exemplified through evaluation experiment.
  • Figure 3: Evolution of success metrics for place goal and vertical constraints across four methods. Policies are deemed learned when the total episode reward remains stable for a sustained interval of training steps, marked on both plots with colored dots. For illustration, representative post-training trajectories for PRISM are shown; the EUREKA (RL-only) baseline is excluded from the plot as it stayed to an unsuccessful idle policy.