Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

K. Niranjan Kumar; Irfan Essa; Sehoon Ha

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

K. Niranjan Kumar, Irfan Essa, Sehoon Ha

TL;DR

This paper tackles the challenge of programming humanoid robot controllers by enabling learning from natural language commands. It combines language-driven human motion generation, IK-based retargeting to a Digit humanoid, and adversarial motion priors to train dynamic, joint-level policies; a language-guided iterative refinement loop further accelerates learning by reinitializing policies from the closest prior checkpoints. The approach yields diverse behaviors and demonstrates a threefold improvement in sample efficiency over learning from scratch in simulation. The work highlights a promising path toward reducing reward engineering and enabling interactive, language-driven policy development for complex robots. Future work includes direct language-to-action translation and real-world validation on Digit.

Abstract

Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first learn a neural network policy that generates behaviors given a natural language command, such as "walk forward", by combining Large Language Models (LLMs), motion retargeting, and motion imitation. Based on the synthesized motion, we iteratively fine-tune by updating the text prompt and querying LLMs to find the best checkpoint associated with the closest motion in history. We validate our approach using a simulated Digit humanoid robot and demonstrate learning of diverse motions, such as walking, hopping, and kicking, without the burden of complex reward engineering. In addition, we show that our iterative refinement enables us to learn 3x times faster than a naive formulation that learns from scratch.

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 4 figures)

This paper contains 17 sections, 6 equations, 4 figures.

INTRODUCTION
RELATED WORK
Robot learning for legged robots
Human motion generation
LLMs for Robot Control
LEARNING POLICIES FROM LANGUAGE PROMPTS
Human motion generation from text input
Motion retargeting from human to robot
Training control policy to imitate retargeted motion
LANGUAGE-GUIDED ITERATIVE POLICY REFINEMENT
EXPERIMENTS
Problem Formulation
Simulation
Results
Learned Skills
...and 2 more sections

Figures (4)

Figure 1: Overview of our proposed approach. Given a language instruction our framework outputs a learned control policy for the corresponding behavior.
Figure 2: Motion frames demonstrating the skills learned with our approach
Figure 3: Reward curves for the different behaviors trained using our approach
Figure 4: Examples of human guided policy refinement using our framework

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

TL;DR

Abstract

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (4)