Training microrobots to swim by a large language model
Zhuoqun Xu, Lailai Zhu
TL;DR
This study demonstrates that GPT-4 can learn efficient, non-reciprocal swimming gaits for microrobots in viscous, low-$Re$ environments by using a minimal five-sentence in-context prompt. The approach enables two archetypal microswimmers, Purcell's three-link swimmer and Najafi-Golestanian's three-sphere swimmer, to acquire their signature strokes under Stokes flow, with explicit handling of the governing equations $\nabla \cdot \mathbf{u} = 0$ and $\mu \nabla^2 \mathbf{u} = \nabla p$. Compared with traditional $Q$-learning, the LLM-driven method achieves learning with far fewer samples and lower technical debt, e.g., RL requiring ~12 steps for the Purcell swimmer and ~40 for NG. To control costs and improve reliability, the authors introduce a history-clearing scheme, discrete action encoding via input transformation, and alias-based prompt compression, while operating at zero temperature for determinism; they also discuss robustness to noise and outline future directions toward continuous actions, complex environments, and cooperative microrobotic swimming.
Abstract
Machine learning and artificial intelligence have recently represented a popular paradigm for designing and optimizing robotic systems across various scales. Recent studies have showcased the innovative application of large language models (LLMs) in industrial control [1] and in directing legged walking robots [2]. In this study, we utilize an LLM, GPT-4, to train two prototypical microrobots for swimming in viscous fluids. Adopting a few-shot learning approach, we develop a minimal, unified prompt composed of only five sentences. The same concise prompt successfully guides two distinct articulated microrobots -- the three-link swimmer and the three-sphere swimmer -- in mastering their signature strokes. These strokes, initially conceptualized by physicists, are now effectively interpreted and applied by the LLM, enabling the microrobots to circumvent the physical constraints inherent to micro-locomotion. Remarkably, our LLM-based decision-making strategy substantially surpasses a traditional reinforcement learning method in terms of training speed. We discuss the nuanced aspects of prompt design, particularly emphasizing the reduction of monetary expenses of using GPT-4.
