Table of Contents
Fetching ...

Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction

Jakob Thumm, Christopher Agia, Marco Pavone, Matthias Althoff

TL;DR

This proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safety controller, and shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate.

Abstract

Adjusting robot behavior to human preferences can require intensive human feedback, preventing quick adaptation to new users and changing circumstances. Moreover, current approaches typically treat user preferences as a reward, which requires a manual balance between task success and user satisfaction. To integrate new user preferences in a zero-shot manner, our proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safety controller. By maximizing the combined probability of task completion and user satisfaction instead of a weighted sum of rewards, we can reliably find plans that fulfill both requirements. We find that 83 % of users working with Text2Interaction agree that it integrates their preferences into the plan of the robot, and 94 % prefer Text2Interaction over the baseline. Our ablation study shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate. Real-world demonstrations and code are made available at sites.google.com/view/text2interaction.

Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction

TL;DR

This proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safety controller, and shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate.

Abstract

Adjusting robot behavior to human preferences can require intensive human feedback, preventing quick adaptation to new users and changing circumstances. Moreover, current approaches typically treat user preferences as a reward, which requires a manual balance between task success and user satisfaction. To integrate new user preferences in a zero-shot manner, our proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safety controller. By maximizing the combined probability of task completion and user satisfaction instead of a weighted sum of rewards, we can reliably find plans that fulfill both requirements. We find that 83 % of users working with Text2Interaction agree that it integrates their preferences into the plan of the robot, and 94 % prefer Text2Interaction over the baseline. Our ablation study shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate. Real-world demonstrations and code are made available at sites.google.com/view/text2interaction.
Paper Structure (23 sections, 11 equations, 8 figures, 1 table)

This paper contains 23 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Preference-aligned planning with Text2Interaction. The user asks the robot to hand them the screwdriver so that they can comfortably grab it. Text2Interaction queries an LLM to return (a) a sequence of primitives that satisfy the task preferences, (b) a set of motion preference functions as executable Python code, and (c) a set of parameters that adjust the safety controller to the current situation and control preferences of the user. Our planner than aims to find a plan that satisfies the user preferences and is feasible to execute. If the planning step fails, we query the LLM to return the optimal skill for the next timestep only, as discussed in \ref{['sec:methodology']}.
  • Figure 2: Detailed overview of the Text2Interaction framework.
  • Figure 3: Main takeaways from our user study. The answers of the 18.0 participants are centered around zero for better comparison.
  • Figure 4: Mean results of our object arrangement experiments. The whiskers display the 95% confidence interval in the reported mean metric.
  • Figure 5: Structure of the prompt used to generate the output of Text2Interaction.
  • ...and 3 more figures