Learning Agile Swimming: An End-to-End Approach without CPGs

Xiaozhu Lin; Xiaopei Liu; Yang Wang

Learning Agile Swimming: An End-to-End Approach without CPGs

Xiaozhu Lin, Xiaopei Liu, Yang Wang

TL;DR

The paper addresses the challenge of achieving agile, energy-efficient swimming in bio-mimetic robotic fish by proposing a model-free, end-to-end DRL framework that directly outputs low-level actuator commands without relying on Central Pattern Generators (CPGs). Training occurs in a high-performance CFD-based FishGym simulator with sim-to-real calibration techniques, enabling zero-shot transfer of policies to real hardware. Key contributions include eliminating the need for dynamic models or predefined gaits, demonstrating superior speed and maneuverability with reduced energy, and validating zero-shot transfer on challenging tasks such as a 180-degree turn and pentagram waypoint tracking. The approach promises practical impact for deploying robotic fish in real aquatic environments by substantially narrowing the sim-to-real gap and simplifying controller design for fluid-structure interaction systems.

Abstract

The pursuit of agile and efficient underwater robots, especially bio-mimetic robotic fish, has been impeded by challenges in creating motion controllers that are able to fully exploit their hydrodynamic capabilities. This paper addresses these challenges by introducing a novel, model-free, end-to-end control framework that leverages Deep Reinforcement Learning (DRL) to enable agile and energy-efficient swimming of robotic fish. Unlike existing methods that rely on predefined trigonometric swimming patterns like Central Pattern Generators (CPG), our approach directly outputs low-level actuator commands without strong constraints, enabling the robotic fish to learn agile swimming behaviors. In addition, by integrating a high-performance Computational Fluid Dynamics (CFD) simulator with innovative sim-to-real strategies, such as normalized density calibration and servo response calibration, the proposed framework significantly mitigates the sim-to-real gap, facilitating direct transfer of control policies to real-world environments without fine-tuning. Comparative experiments demonstrate that our method achieves faster swimming speeds, smaller turn-around radii, and reduced energy consumption compared to the state-of-the-art swimming controllers. Furthermore, the proposed framework shows promise in addressing complex tasks, paving the way for more effective deployment of robotic fish in real aquatic environments.

Learning Agile Swimming: An End-to-End Approach without CPGs

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 8 figures, 2 tables)

This paper contains 19 sections, 2 equations, 8 figures, 2 tables.

Introduction
Robotic Fish and Experimental Platform
Robotic Fish
Experimental Platform
Methodology
Neural Network Controller
State Space
Action Space
Reward Function
CFD Simulator
Normalized Density Calibration
Actuator Response Calibration
Policy Training Details
Experiments
Training Process and Evaluations
...and 4 more sections

Figures (8)

Figure 1: Overview of the experimental platform. (a) Schematic of interior details of the prototype of robotic fish and the entire experimental platform. (b) Diagram of the data processing procedure for the experiment. The modules in yellow run on a personal computer at 50Hz and communicate wirelessly with the robotic fish through UDP protocol.
Figure 2: The proposed learning framework. (a) High-performance CFD simulator with proposed sim-to-real techniques. Only a small amount of real-world data (e.g., 30 seconds) calibration is needed to provide accurate three-dimensional fluid interaction dynamics. (b) Learning end-to-end policy from scratch. By using deep reinforcement learning in the calibrated CFD simulator, agent can learn agile swimming policy that can be directly transferred to the real-world without fine-tuning. (c) The developed robotic fish experimental platform. Used to evaluate the swimming performance of various robotic fish controllers.
Figure 3: Validation of the calibrated CFD simulator using the proposed techniques. The the simulated results in the CFD simulator (blue) and the measured results in experimental platform (red) of the robotic fish states (Surge, Sway, and Yaw) are shown. Two desired signal (black) with different types and frequency were used for testing, (a) 1 Hz sinusoidal signal and (b) 0.5 Hz square wave signal, with amplitudes of 0.52 rad (i.e., 30 degree) for both signals. All three joints follow the same signal, and for a clearer display, only the change curve of the first joint (near the head) is shown here.
Figure 4: Evaluation of policies in different training stages. (a) Learning curves of the average reward during the training process. The training was repeated three times, with the solid line representing the mean and the shaded area representing the standard deviation range. (b) The trajectory generated by the robotic fish when completing position control at four typical learning stages: A, B, C, and D. Each evaluation is repeated 10 times, and the initial and target points are randomly sampled within the start zone (red) and target zone (green), respectively.
Figure 5: Screenshot sequence of the robotic fish performing a 180 degree sharp turn maneuver in both simulation and experiment. There are two points worth noting, one is the high degree of overlap between simulation and physical movement, indicating the success of sim-to-real policy transfer. The second point is that the turning maneuver of the robotic fish is very fast, requiring a small turning radius, which proves the success of end-to-end agile policy learning.
...and 3 more figures

Learning Agile Swimming: An End-to-End Approach without CPGs

TL;DR

Abstract

Learning Agile Swimming: An End-to-End Approach without CPGs

Authors

TL;DR

Abstract

Table of Contents

Figures (8)