Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

Xinhui Shan; Yuhong Huang; Zhenshan Bing; Zitao Zhang; Xiangtong Yao; Kai Huang; Alois Knoll

Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

Xinhui Shan, Yuhong Huang, Zhenshan Bing, Zitao Zhang, Xiangtong Yao, Kai Huang, Alois Knoll

TL;DR

This paper tackles locomotion generation for a small rat-like quadruped (NeRmo) with limited sensor feedback by leveraging reinforcement learning augmented with frequency-domain perception. It extracts robust environmental cues from IMU data using FFT, filters to the gait-relevant band, and represents environmental changes with a simplified two-sine model, reducing noise and computation. A multifunctional reward framework ties fall penalties and axis-specific desirables to guide adaptive gaits across ramps, stairs, and spiral stairs, learned via PPO without pre-training. Experiments show rapid convergence (≈0.25M steps) and high success rates across several environments, highlighting the approach's potential for robust, sensor-efficient locomotion in small robots.

Abstract

This research focuses on developing reinforcement learning approaches for the locomotion generation of small-size quadruped robots. The rat robot NeRmo is employed as the experimental platform. Due to the constrained volume, small-size quadruped robots typically possess fewer and weaker sensors, resulting in difficulty in accurately perceiving and responding to environmental changes. In this context, insufficient and imprecise feedback data from sensors makes it difficult to generate adaptive locomotion based on reinforcement learning. To overcome these challenges, this paper proposes a novel reinforcement learning approach that focuses on extracting effective perceptual information to enhance the environmental adaptability of small-size quadruped robots. According to the frequency of a robot's gait stride, key information of sensor data is analyzed utilizing sinusoidal functions derived from Fourier transform results. Additionally, a multifunctional reward mechanism is proposed to generate adaptive locomotion in different tasks. Extensive simulations are conducted to assess the effectiveness of the proposed reinforcement learning approach in generating rat robot locomotion in various environments. The experiment results illustrate the capability of the proposed approach to maintain stable locomotion of a rat robot across different terrains, including ramps, stairs, and spiral stairs.

Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

TL;DR

Abstract

Paper Structure (15 sections, 11 equations, 10 figures)

This paper contains 15 sections, 11 equations, 10 figures.

Introduction
Overview
Environment Perception
Sensor Key Information Extraction
Discussion for Describing Environmental Changes
Reinforcement Learning Application in NeRmo
Action Space
State Space
Multifunctional Reward Function
Experiment
Experimental Setup
Training Process
Environmental Adaptability
Scenarios Extension
Conclusion

Figures (10)

Figure 1: A rat-like robot attempts to ascend the stairs.
Figure 2: Architecture for applying RL into control of a rat robot. The blue boxes represent the data operation. The sensor data in green boxes is directly observed during robot work. The gray box is the action generated by RL's policy net. The yellow boxes are the necessary input of the policy net.
Figure 3: Data processing for analyzing sensor data. This figure outlines analyzing gyroscopic data along the x-axis in one period T, sampled at a 100Hz sampling rate. (a) displays the raw data. In (b), we apply the Fast Fourier Transform (FFT) to identify frequency components. (c) filters out frequencies outside the 0.1Hz-10Hz range to weaken noise. (d) shows the filtered time signals transformed by the Inverse Fast Fourier Transform (IFFT). In (e), we fit all filtered time signals referring to the Equation (\ref{['sum_fuc']}). (g) shows the data fitted using only the significant frequencies from (f) according to Equation (\ref{['two_sin']}), which can also effectively represent environmental changes.
Figure 4: Gyroscope data variation on the x-axis of NeRmo. During NeRmo's ramp descent, six time slots are observed: (a) Data on flat terrain, (b) Data variation at the start of the descent, (c) Data during the descent, (d) Data variation transitioning from descent to flat terrain, (e) Data on the lower flat terrain.
Figure 5: Schematic of reward functions in different environments. The black arrows are NeRmo's local coordinates. The green arrows are the combined reward $R'$ of each scenario. The red arrows represent penalty terms and loss values. $P_f$ refers to the fall penalty. The loss values for unstable action or action with deviated direction are $L_0$ and $L_1$, respectively.
...and 5 more figures

Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

TL;DR

Abstract

Locomotion Generation for a Rat Robot based on Environmental Changes via Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)