Table of Contents
Fetching ...

Understanding Generative AI in Robot Logic Parametrization

Yuna Hwang, Arissa J. Sato, Pragathi Praveena, Nathan Thomas White, Bilge Mutlu

TL;DR

The background in capturing natural language end-user intent and the capabilities of an LLM interpreting end-user intent and mapping natural language to predefined, cross-modal data in the process of iterative program development are outlined.

Abstract

Leveraging generative AI (for example, Large Language Models) for language understanding within robotics opens up possibilities for LLM-driven robot end-user development (EUD). Despite the numerous design opportunities it provides, little is understood about how this technology can be utilized when constructing robot program logic. In this paper, we outline the background in capturing natural language end-user intent and summarize previous use cases of LLMs within EUD. Taking the context of filmmaking as an example, we explore how a cinematography practitioner's intent to film a certain scene can be articulated using natural language, captured by an LLM, and further parametrized as low-level robot arm movement. We explore the capabilities of an LLM interpreting end-user intent and mapping natural language to predefined, cross-modal data in the process of iterative program development. We conclude by suggesting future opportunities for domain exploration beyond cinematography to support language-driven robotic camera navigation.

Understanding Generative AI in Robot Logic Parametrization

TL;DR

The background in capturing natural language end-user intent and the capabilities of an LLM interpreting end-user intent and mapping natural language to predefined, cross-modal data in the process of iterative program development are outlined.

Abstract

Leveraging generative AI (for example, Large Language Models) for language understanding within robotics opens up possibilities for LLM-driven robot end-user development (EUD). Despite the numerous design opportunities it provides, little is understood about how this technology can be utilized when constructing robot program logic. In this paper, we outline the background in capturing natural language end-user intent and summarize previous use cases of LLMs within EUD. Taking the context of filmmaking as an example, we explore how a cinematography practitioner's intent to film a certain scene can be articulated using natural language, captured by an LLM, and further parametrized as low-level robot arm movement. We explore the capabilities of an LLM interpreting end-user intent and mapping natural language to predefined, cross-modal data in the process of iterative program development. We conclude by suggesting future opportunities for domain exploration beyond cinematography to support language-driven robotic camera navigation.

Paper Structure

This paper contains 7 sections, 1 figure.

Figures (1)

  • Figure 1: (Left) Formative study: We plan to conduct a formative study where a cinematography expert (denoted as the director) uses natural language and movement to express filming intent. The experimenter will act as the camera operator, closely following the expert's instruction and moving the camera. (Middle) Input for Module: Different types of input modality will be used to construct separate robot program modules, namely the action module and the intent module. (Right) LLM Interaction: We envision the interaction scenario with the robotic arm mediated by LLM, where it interprets keywords within the end-user prompt and maps those keywords to pre-defined modules and respective parameter values. A robotic camera can initiate a movement given those parameter values.