Table of Contents
Fetching ...

Domain Randomization for Robust, Affordable and Effective Closed-loop Control of Soft Robots

Gabriele Tiboni, Andrea Protopapa, Tatiana Tommasi, Giuseppe Averta

TL;DR

The paper tackles the sim-to-real gap in closed-loop soft-robot control caused by infinite degrees of freedom and unmodeled deformable dynamics. It introduces Reset-Free DROPO, an offline adaptive domain randomization method tailored for partially observable soft robots, which learns a distribution over deformable-dynamics parameters from real trajectories and trains robust policies with simpler simulators. Key contributions include accurate inference of parameters like Poisson's ratio and friction, reduced training time via simple-model DR, and evidence that environment randomization can improve exploration and exploit environmental constraints. The findings suggest DR-based approaches can enable robust, affordable, and effective soft-robot control in simulated-to-real pipelines, with clear paths toward real-world deployment.

Abstract

Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots with: i) robustness w.r.t. unknown dynamics parameters; ii) reduced training times by exploiting drastically simpler dynamic models for learning; iii) better environment exploration, which can lead to exploitation of environmental constraints for optimal performance. Moreover, we introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide an extensive evaluation in simulation on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.

Domain Randomization for Robust, Affordable and Effective Closed-loop Control of Soft Robots

TL;DR

The paper tackles the sim-to-real gap in closed-loop soft-robot control caused by infinite degrees of freedom and unmodeled deformable dynamics. It introduces Reset-Free DROPO, an offline adaptive domain randomization method tailored for partially observable soft robots, which learns a distribution over deformable-dynamics parameters from real trajectories and trains robust policies with simpler simulators. Key contributions include accurate inference of parameters like Poisson's ratio and friction, reduced training time via simple-model DR, and evidence that environment randomization can improve exploration and exploit environmental constraints. The findings suggest DR-based approaches can enable robust, affordable, and effective soft-robot control in simulated-to-real pipelines, with clear paths toward real-world deployment.

Abstract

Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots with: i) robustness w.r.t. unknown dynamics parameters; ii) reduced training times by exploiting drastically simpler dynamic models for learning; iii) better environment exploration, which can lead to exploitation of environmental constraints for optimal performance. Moreover, we introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide an extensive evaluation in simulation on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
Paper Structure (20 sections, 3 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Paradigms in RL-based robot learning: a) training directly on the real world; b) naïve Sim-to-Real transfer suffers from the reality gap; c) Training with domain randomization increases robustness to modelling approximations and errors; d) distributions over simulator dynamics parameters may be automatically inferred from real-world data for use with DR.
  • Figure 2: Overview of (bottom) Reset-Free DROPO algorithm vs. (top) its original counterpart with intermediate state-resetting.
  • Figure 3: TrunkReach and TrunkPush setups. a) Purple dots define the box of possible goal locations sampled at training time, green dots are 27 fixed target locations for evaluation, with the red dot being the current goal. b) Green dot is the desired target location for the box center of mass.
  • Figure 4: Multigait locomotion environment: (a) simplified vs. (b) accurate simulation model.
  • Figure 5: Vanilla parameter estimation: policy evaluation in terms of distance from the goal position (lower is better).
  • ...and 3 more figures