Table of Contents
Fetching ...

TMR-VLA:Vision-Language-Action Model for Magnetic Motion Control of Tri-leg Silicone-based Soft Robot

Ruijie Tang, Chi Kit Ng, Kaixuan Wu, Long Bai, Guankun Wang, Yiming Huang, Yupeng Wang, Hongliang Ren

Abstract

In-vivo environments, magnetically actuated soft robots offer advantages such as wireless operation and precise control, showing promising potential for painless detection and therapeutic procedures. We developed a trileg magnetically driven soft robot (TMR) whose multi-legged design enables more flexible gaits and diverse motion patterns. For the silicone made of reconfigurable soft robots, its navigation ability can be separated into sequential motions, namely squatting, rotation, lifting a leg, walking and so on. Its motion and behavior depend on its bending shapes. To bridge motion type description and specific low-level voltage control, we introduced TMR-VLA, an end-to-end multi-modal system for a trileg magnetic soft robot capable of performing hybrid motion types, which is promising for developing a navigation ability by adapting its shape to language-constrained motion types. The TMR-VLA deploys embodied endoluminal localization ability from EndoVLA, and fuses sequential frames and natural language commands as input. Low-level voltage output is generated based on the current observation state and specific motion type description. The result shows the TMR-VLA can predict how the voltage applied to TMR will change the dynamics of a silicon-made soft robot. The TMR-VLA reached a 74% average success rate.

TMR-VLA:Vision-Language-Action Model for Magnetic Motion Control of Tri-leg Silicone-based Soft Robot

Abstract

In-vivo environments, magnetically actuated soft robots offer advantages such as wireless operation and precise control, showing promising potential for painless detection and therapeutic procedures. We developed a trileg magnetically driven soft robot (TMR) whose multi-legged design enables more flexible gaits and diverse motion patterns. For the silicone made of reconfigurable soft robots, its navigation ability can be separated into sequential motions, namely squatting, rotation, lifting a leg, walking and so on. Its motion and behavior depend on its bending shapes. To bridge motion type description and specific low-level voltage control, we introduced TMR-VLA, an end-to-end multi-modal system for a trileg magnetic soft robot capable of performing hybrid motion types, which is promising for developing a navigation ability by adapting its shape to language-constrained motion types. The TMR-VLA deploys embodied endoluminal localization ability from EndoVLA, and fuses sequential frames and natural language commands as input. Low-level voltage output is generated based on the current observation state and specific motion type description. The result shows the TMR-VLA can predict how the voltage applied to TMR will change the dynamics of a silicon-made soft robot. The TMR-VLA reached a 74% average success rate.
Paper Structure (23 sections, 7 equations, 5 figures, 2 tables)

This paper contains 23 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overall workflow of the VLA model for magnetic motion control of the trileg soft robot
  • Figure 2: Motion characterization of the Tri-leg Magnetic Soft Robot: (a) Squatting height (z-axis) vs. voltage (b) Stepping distance (alternating gait) vs. voltage
  • Figure 3: Inference framework of the tri-leg magnetic robot VLA (TMR-VLA). The model consumes a short frame window and an instruction, then autoregressively emits quantized voltage increments that are dequantized and safety-projected before actuation.
  • Figure 4: (a) The illustration of the trileg robot. The magnetic field provided the torques in 3 dimensions. Experimental parameters description: (b) Maximum squat distance, (c) Leg lift height, (d) Anchor one foot, rotate body, (e) Forward distance (f) Reach target and recover.
  • Figure 5: Experimental results demonstrate that TMR-VLA achieves higher success rates in executing multi-step actions. The arrow indicates the magnitude and direction of the action output.