SayTap: Language to Quadrupedal Locomotion
Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada
TL;DR
This paper tackles the challenge of translating natural-language commands into low-level locomotion for quadrupeds by introducing foot contact patterns as a compact interface. It couples an LLM-driven translator that maps NL inputs to 4×$L_w$ contact templates with a DRL-based locomotion controller trained to realize those patterns, using a Random Pattern Generator to expose the policy to diverse patterns. The approach achieves higher contact-pattern prediction accuracy and relaxes the need for hand-crafted high-level APIs, demonstrating successful transfer from simulation to a real Unitree A1 robot and outperforming two baselines across 30 tasks. By enabling direct, flexible user instructions—including vague or emotional cues—the method advances interactive, adaptable legged locomotion with practical potential for assistive robotics and embodied AI.
Abstract
Large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques. This paper proposes an approach to use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. This results in an interactive system for quadrupedal robots that allows the users to craft diverse locomotion behaviors flexibly. We contribute an LLM prompt design, a reward function, and a method to expose the controller to the feasible distribution of contact patterns. The results are a controller capable of achieving diverse locomotion patterns that can be transferred to real robot hardware. Compared with other design choices, the proposed approach enjoys more than 50% success rate in predicting the correct contact patterns and can solve 10 more tasks out of a total of 30 tasks. Our project site is: https://saytap.github.io.
