Table of Contents
Fetching ...

Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion

Vaughn Gzenda, Robin Chhabra

TL;DR

The paper tackles enabling soft robotic crawlers to learn locomotion policies from noisy sensor data without explicit continuum-body models. It introduces a latent dynamics model learned from IMU and TOF measurements and embeds it in a Dreamer-style actor-critic framework to optimize periodic gait parameters. Perception is guided by a variational free energy objective, while latent predictions drive short-horizon planning for policy optimization. In simulation, the approach yields effective gaits that achieve forward locomotion toward a target within roughly 14 seconds, demonstrating robustness to sensor noise and potential for autonomous soft-robot locomotion.

Abstract

Soft robotic crawlers are mobile robots that utilize soft body deformability and compliance to achieve locomotion through surface contact. Designing control strategies for such systems is challenging due to model inaccuracies, sensor noise, and the need to discover locomotor gaits. In this work, we present a model-based reinforcement learning (MB-RL) framework in which latent dynamics inferred from onboard sensors serve as a predictive model that guides an actor-critic algorithm to optimize locomotor policies. We evaluate the framework on a minimal crawler model in simulation using inertial measurement units and time-of-flight sensors as observations. The learned latent dynamics enable short-horizon motion prediction while the actor-critic discovers effective locomotor policies. This approach highlights the potential of latent-dynamics MB-RL for enabling embodied soft robotic adaptive locomotion based solely on noisy sensor feedback.

Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion

TL;DR

The paper tackles enabling soft robotic crawlers to learn locomotion policies from noisy sensor data without explicit continuum-body models. It introduces a latent dynamics model learned from IMU and TOF measurements and embeds it in a Dreamer-style actor-critic framework to optimize periodic gait parameters. Perception is guided by a variational free energy objective, while latent predictions drive short-horizon planning for policy optimization. In simulation, the approach yields effective gaits that achieve forward locomotion toward a target within roughly 14 seconds, demonstrating robustness to sensor noise and potential for autonomous soft-robot locomotion.

Abstract

Soft robotic crawlers are mobile robots that utilize soft body deformability and compliance to achieve locomotion through surface contact. Designing control strategies for such systems is challenging due to model inaccuracies, sensor noise, and the need to discover locomotor gaits. In this work, we present a model-based reinforcement learning (MB-RL) framework in which latent dynamics inferred from onboard sensors serve as a predictive model that guides an actor-critic algorithm to optimize locomotor policies. We evaluate the framework on a minimal crawler model in simulation using inertial measurement units and time-of-flight sensors as observations. The learned latent dynamics enable short-horizon motion prediction while the actor-critic discovers effective locomotor policies. This approach highlights the potential of latent-dynamics MB-RL for enabling embodied soft robotic adaptive locomotion based solely on noisy sensor feedback.

Paper Structure

This paper contains 15 sections, 39 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Model of Soft Robotic Crawler
  • Figure 2: Summary of the training loop of the latent space state transition, actor and critic models.
  • Figure 3: Average total rewards of 5 training runs with a moving average over 20 episodes over 100,000 gradient steps (solid blue). We include the variance over the training runs plotted in light blue.
  • Figure 4: Upper: Location of the crawler under the locomotion policy. The head of the crawler is plotted in orange, the tail is plotted in blue, and the location of the centre of mass in green. Lower: Inertial measurement unit and time-of-flight observations

Theorems & Definitions (2)

  • Remark 1
  • Remark 2