Table of Contents
Fetching ...

SayTap: Language to Quadrupedal Locomotion

Yujin Tang, Wenhao Yu, Jie Tan, Heiga Zen, Aleksandra Faust, Tatsuya Harada

TL;DR

This paper tackles the challenge of translating natural-language commands into low-level locomotion for quadrupeds by introducing foot contact patterns as a compact interface. It couples an LLM-driven translator that maps NL inputs to 4×$L_w$ contact templates with a DRL-based locomotion controller trained to realize those patterns, using a Random Pattern Generator to expose the policy to diverse patterns. The approach achieves higher contact-pattern prediction accuracy and relaxes the need for hand-crafted high-level APIs, demonstrating successful transfer from simulation to a real Unitree A1 robot and outperforming two baselines across 30 tasks. By enabling direct, flexible user instructions—including vague or emotional cues—the method advances interactive, adaptable legged locomotion with practical potential for assistive robotics and embodied AI.

Abstract

Large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques. This paper proposes an approach to use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. This results in an interactive system for quadrupedal robots that allows the users to craft diverse locomotion behaviors flexibly. We contribute an LLM prompt design, a reward function, and a method to expose the controller to the feasible distribution of contact patterns. The results are a controller capable of achieving diverse locomotion patterns that can be transferred to real robot hardware. Compared with other design choices, the proposed approach enjoys more than 50% success rate in predicting the correct contact patterns and can solve 10 more tasks out of a total of 30 tasks. Our project site is: https://saytap.github.io.

SayTap: Language to Quadrupedal Locomotion

TL;DR

This paper tackles the challenge of translating natural-language commands into low-level locomotion for quadrupeds by introducing foot contact patterns as a compact interface. It couples an LLM-driven translator that maps NL inputs to 4× contact templates with a DRL-based locomotion controller trained to realize those patterns, using a Random Pattern Generator to expose the policy to diverse patterns. The approach achieves higher contact-pattern prediction accuracy and relaxes the need for hand-crafted high-level APIs, demonstrating successful transfer from simulation to a real Unitree A1 robot and outperforming two baselines across 30 tasks. By enabling direct, flexible user instructions—including vague or emotional cues—the method advances interactive, adaptable legged locomotion with practical potential for assistive robotics and embodied AI.

Abstract

Large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques. This paper proposes an approach to use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. This results in an interactive system for quadrupedal robots that allows the users to craft diverse locomotion behaviors flexibly. We contribute an LLM prompt design, a reward function, and a method to expose the controller to the feasible distribution of contact patterns. The results are a controller capable of achieving diverse locomotion patterns that can be transferred to real robot hardware. Compared with other design choices, the proposed approach enjoys more than 50% success rate in predicting the correct contact patterns and can solve 10 more tasks out of a total of 30 tasks. Our project site is: https://saytap.github.io.
Paper Structure (23 sections, 7 figures, 3 tables)

This paper contains 23 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of the results on a physical quadrupedal robot. We show two test commands at the top, and the snapshots of the robot in the top row of the figure. The row in the middle shows the desired contact patterns translated from the commands by an LLM (the pattern in between the commands requests the robot to put all feet on the ground and stand still), and the bottom row gives the realized patterns. The proposed approach allows the robot to take both simple and direct instructions (e.g., "Trot forward slowly") as well as vague human commands (e.g., "Good news, we are going to a picnic this weekend!") in natural language and react accordingly.
  • Figure 2: Overview of the proposed approach. In addition to the robot's proprioceptive sensory data and task commands (e.g., following a desired linear velocity $\hat{v}_x$), the locomotion controller accepts desired foot contact patterns as input, and outputs desired joint positions. The foot contact patterns are extracted by a cyclic sliding window of size $L_w$ from a pattern template, which is generated by a random pattern generator during training, and is translated from human commands in natural language by an LLM in tests. We show some examples of contact pattern templates at the bottom.
  • Figure 3: Our exact prompt for our method in all experiments. The final "Input:" is followed by user specified command. Texts in black are for explanation and are not used as input to the LLM.
  • Figure 4: Baselines prompts. Differences from our prompt are highlighted in blue. The "Gait definition block" is not changed and omitted in the figure. Texts in black are for explanation thus they are not used as input to the LLM.
  • Figure 5: Accuracy comparison of generated patterns. For each command in Table \ref{['tab:basic_tests']}, we generate 5 patterns from the LLM and compare them against the expected results. We show the aggregated accuracy over all commands on the left of the first row, followed by the individual results.
  • ...and 2 more figures