Training microrobots to swim by a large language model

Zhuoqun Xu; Lailai Zhu

Training microrobots to swim by a large language model

Zhuoqun Xu, Lailai Zhu

TL;DR

This study demonstrates that GPT-4 can learn efficient, non-reciprocal swimming gaits for microrobots in viscous, low-$Re$ environments by using a minimal five-sentence in-context prompt. The approach enables two archetypal microswimmers, Purcell's three-link swimmer and Najafi-Golestanian's three-sphere swimmer, to acquire their signature strokes under Stokes flow, with explicit handling of the governing equations $\nabla \cdot \mathbf{u} = 0$ and $\mu \nabla^2 \mathbf{u} = \nabla p$. Compared with traditional $Q$-learning, the LLM-driven method achieves learning with far fewer samples and lower technical debt, e.g., RL requiring ~12 steps for the Purcell swimmer and ~40 for NG. To control costs and improve reliability, the authors introduce a history-clearing scheme, discrete action encoding via input transformation, and alias-based prompt compression, while operating at zero temperature for determinism; they also discuss robustness to noise and outline future directions toward continuous actions, complex environments, and cooperative microrobotic swimming.

Abstract

Machine learning and artificial intelligence have recently represented a popular paradigm for designing and optimizing robotic systems across various scales. Recent studies have showcased the innovative application of large language models (LLMs) in industrial control [1] and in directing legged walking robots [2]. In this study, we utilize an LLM, GPT-4, to train two prototypical microrobots for swimming in viscous fluids. Adopting a few-shot learning approach, we develop a minimal, unified prompt composed of only five sentences. The same concise prompt successfully guides two distinct articulated microrobots -- the three-link swimmer and the three-sphere swimmer -- in mastering their signature strokes. These strokes, initially conceptualized by physicists, are now effectively interpreted and applied by the LLM, enabling the microrobots to circumvent the physical constraints inherent to micro-locomotion. Remarkably, our LLM-based decision-making strategy substantially surpasses a traditional reinforcement learning method in terms of training speed. We discuss the nuanced aspects of prompt design, particularly emphasizing the reduction of monetary expenses of using GPT-4.

Training microrobots to swim by a large language model

TL;DR

This study demonstrates that GPT-4 can learn efficient, non-reciprocal swimming gaits for microrobots in viscous, low-

environments by using a minimal five-sentence in-context prompt. The approach enables two archetypal microswimmers, Purcell's three-link swimmer and Najafi-Golestanian's three-sphere swimmer, to acquire their signature strokes under Stokes flow, with explicit handling of the governing equations

and

. Compared with traditional

-learning, the LLM-driven method achieves learning with far fewer samples and lower technical debt, e.g., RL requiring ~12 steps for the Purcell swimmer and ~40 for NG. To control costs and improve reliability, the authors introduce a history-clearing scheme, discrete action encoding via input transformation, and alias-based prompt compression, while operating at zero temperature for determinism; they also discuss robustness to noise and outline future directions toward continuous actions, complex environments, and cooperative microrobotic swimming.

Abstract

Paper Structure (4 sections, 6 equations, 5 figures)

This paper contains 4 sections, 6 equations, 5 figures.

History clearing scheme
Avoiding floating-point or negative numbers
Saving money by using aliases extensively
Temperature (randomness) within GPT-4

Figures (5)

Figure 1: Diagrammatic representation of utilizing an LLM, GPT-4 adopted here, for microrobotic locomotion. We have developed a minimal, unified minimal prompt capable of instructing two model microswimmers---Purcell's swimmer purcell1977life and NG' swimmer najafi2004simple---to propel through highly viscous fluids. These microswimmers' movements are subject to viscous hydrodynamics governed by the Stokes equations: $\grad \cdot \mathbf{u} = 0$ and $\mu \grad^2 \mathbf{u} = \grad p$, where $\mathbf{u}$ and $p$ denote the velocity and pressure fields, respectively, with $\mu$ indicating the fluid's dynamic viscosity. Comprising only five sentences, the prompt effectively orchestrates the interaction between the swimmer and the LLM, directing the former to swim along the horizontal axis ($\mathbf{e}_x$ or $-\mathbf{e}_x$) with maximal speed. The swimmer's displacement is $X = \mathbf{r}_{\text{c}} \cdot \mathbf{e}_x$, where $\mathbf{r}_{\text{c}}$ denotes its geometric centroid.
Figure 2: A, displacement $X/a$ of Purcell's swimmer versus its execution step $n$, which is trained by the LLM (solid line) and $Q$-learning-based RL (dashed line). The lower panel demonstrates the cycle of signature gaits learned by the swimmer. B, same as A, but for NG's swimmer.
Figure 3: Influence of the length $n_{\text{ht}}$ of historical records on the swimmers' learning performance---the swimmer's displacement in the target direction, $X/a$ (positive $\mathbf{e}_x$, left column) or $-X/a$ (negative $\mathbf{e}_x$, right column) versus the execution step $n$. The upper and lower rows correspond to Purcell's swimmer and NG's swimmer, respectively.
Figure 4: Average displacement $\langle X \rangle/a$ and overall success rate $p$ for Purcell's swimmer (A) and NG's swimmer (B) guided along the $\mathbf{e}_x$ direction, under varying noise levels $\zeta$. For each level, $10$ individual runs are conducted to obtain the statistics.
Figure 5: Criticality of the five sentences, labeled S1 to S5, in the prompt. Left column: how the success rate $p$ of Purcell's swimmer (A) and NG's swimmer (C) guided in the $\mathbf{e}_x$ direction is affected when a single sentence is removed from the prompt, in comparison to the results obtained from the complete prompt (12 o'clock in the pie chart). The right column mirrors the left, but instead evaluates the impact on the average displacement $\langle X \rangle/a$, rather than the success rate $p$.

Training microrobots to swim by a large language model

TL;DR

Abstract

Training microrobots to swim by a large language model

Authors

TL;DR

Abstract

Table of Contents

Figures (5)