Learning control strategy in soft robotics through a set of configuration spaces

Etienne Ménager; Christian Duriez

Learning control strategy in soft robotics through a set of configuration spaces

Etienne Ménager, Christian Duriez

TL;DR

A method for controlling soft robots that involves defining a graph of configuration spaces based on the contact configurations is proposed and demonstrated through two scenarios, a deformable beam in contact with its environment and a soft manipulator, where it outperforms the baseline in terms of stability, learning speed, and interpretability.

Abstract

The ability of a soft robot to perform specific tasks is determined by its contact configuration, and transitioning between configurations is often necessary to reach a desired position or manipulate an object. Based on this observation, we propose a method for controlling soft robots that involves defining a graph of configuration spaces. Different agents, whether learned or not (convex optimization, expert trajectory, and collision detection), use the structure of the graph to solve the desired task. The graph and the agents are part of the prior knowledge that is intuitively integrated into the learning process. They are used to combine different optimization methods, improve sample efficiency, and provide interpretability. We construct the graph based on the contact configurations and demonstrate its effectiveness through two scenarios, a deformable beam in contact with its environment and a soft manipulator, where it outperforms the baseline in terms of stability, learning speed, and interpretability.

Learning control strategy in soft robotics through a set of configuration spaces

TL;DR

Abstract

Paper Structure (20 sections, 5 equations, 6 figures)

This paper contains 20 sections, 5 equations, 6 figures.

Introduction
Background and Notations
Methods
General overview
Structure and Training of the different agents
Train the Selector
Train the Evaluator
Train the internal agents
Train the external agents
Materials: robots and creation of configuration spaces
The CartStemContact example
The RodManipulator example
Results
Solving the CartStemContact and convex optimization
Manipulation of the rod and expert trajectories
...and 5 more sections

Figures (6)

Figure 1: Schematic of the different elements constituting the knowledge graph. Each node in the graph corresponds to a configuration space, and each observation belongs to one node. Each node has an identifier, a neighbourhood (other configuration spaces), an internal agent (in red), and an external agent (in blue). The Evaluator (in pink) can either belong to a node or be shared between the nodes. The Selector (in orange) is global in the knowledge graph. Three steps are then performed. (1) The Selector determines in which configuration space the robot is. (2) The Evaluator decides in which configuration space to go. (3) The internal agent or external agent solves the task in the current space or moves to a different space.
Figure 2: Splitting the state space into different configuration spaces (right) for two soft systems (left) based on the contact configuration. (A) CartStemContact. (B) RodManipulator. In this example, some contact configurations are gathered in one configuration space, not useful for the manipulation task.
Figure 3: Illustration of the limitations of optimization-based control approaches in the case of the CartStemContact robot. A deformable beam is fixed on a mobile base that can move horizontally. Two obstacles limit the movement of the beam. The objective is to minimize the distance between the end of the beam and a horizontal position behind one of the obstacles. (A) When the robot is not in contact with an obstacle, the use of optimization-based control leads to a local minimum. (B) To solve the task, the robot must first be in contact with the opposite obstacle. The presence of the contact between the obstacle and the robot changes the optimization space.
Figure 4: Learning results, reward as a function of the iterations. Results obtained with the SAC algorithm (orange), with our method (blue) and with our method with internal agents performed with convex optimization (purple) in the case of the CartStemContact. The learning conditions are the same in all three examples. The initial difference comes from the fact that the first results are obtained after 500 iterations, and that the method with internal agent performed with optimisation learns to solve the task faster than the other methods. Sliding average is used to facilitate the reading of the results. The size of the windows for the sliding average is approximately 2.5% of the number of iterations.
Figure 5: Example of RodManipulator task resolution for a target angle of 280°. Starting without contact, the algorithm first uses an expert trajectory to position the robot in contact with the rod. The rod is then manipulated until it reaches a configuration where it is not possible to move further without changing the contact configuration. The Evaluator then uses another expert trajectory to reconfigure the contacts and continues to rotate the rod until it reaches 280°.
...and 1 more figures

Learning control strategy in soft robotics through a set of configuration spaces

TL;DR

Abstract

Learning control strategy in soft robotics through a set of configuration spaces

Authors

TL;DR

Abstract

Table of Contents

Figures (6)