Curriculum Is More Influential Than Haptic Information During Reinforcement Learning of Object Manipulation Against Gravity

Pegah Ojaghi; Romina Mir; Ali Marjaninejad; Andrew Erwin; Michael Wehner; Francisco J Valero-Cueva

Curriculum Is More Influential Than Haptic Information During Reinforcement Learning of Object Manipulation Against Gravity

Pegah Ojaghi, Romina Mir, Ali Marjaninejad, Andrew Erwin, Michael Wehner, Francisco J Valero-Cueva

TL;DR

This work investigates the role of curriculum learning and haptic feedback in enabling the learning of dexterous manipulation and challenges long-held notions about the need for tactile information to autonomously learn in-hand dexterous manipulation.

Abstract

Learning to lift and rotate objects with the fingertips is necessary for autonomous in-hand dexterous manipulation. In our study, we explore the impact of various factors on successful learning strategies for this task. Specifically, we investigate the role of curriculum learning and haptic feedback in enabling the learning of dexterous manipulation. Using model-free Reinforcement Learning, we compare different curricula and two haptic information modalities (No-tactile vs. 3D-force sensing) for lifting and rotating a ball against gravity with a three-fingered simulated robotic hand with no visual input. Note that our best results were obtained when we used a novel curriculum-based learning rate scheduler, which adjusts the linearly-decaying learning rate when the reward is changed as it accelerates convergence to higher rewards. Our findings demonstrate that the choice of curriculum greatly biases the acquisition of different features of dexterous manipulation. Surprisingly, successful learning can be achieved even in the absence of tactile feedback, challenging conventional assumptions about the necessity of haptic information for dexterous manipulation tasks. We demonstrate the generalizability of our results to balls of different weights and sizes, underscoring the robustness of our learning approach. This work, therefore, emphasizes the importance of the choice curriculum and challenges long-held notions about the need for tactile information to autonomously learn in-hand dexterous manipulation.

Curriculum Is More Influential Than Haptic Information During Reinforcement Learning of Object Manipulation Against Gravity

TL;DR

Abstract

Paper Structure (42 sections, 3 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 42 sections, 3 equations, 13 figures, 5 tables, 1 algorithm.

Introduction
Results
Discussion
Methods
Acknowledgements
Author contributions
Competing Interests
Supplementary Information

Figures (13)

Figure 1: Overview of Simulation Environment and Learning. High-level overview of the simulation environment and learning approach to autonomous manipulation. See the Methods section for further details. A: Simulation Environment. A simulated three-finger robotic hand attempted to lift and rotate (i.e., dexterously manipulate) a ball. The 3D movement of the ball was lightly constrained to the X-Z plane. Changes in the ball state affect the reward, which is a function of rotation, lift, and/or a combination of the two. We tested this approach with two different tactile information conditions (No-tactile and 3D-force) available at the fingertips and four balls of different weights and sizes. B: Learning Algorithm.Independent Trial, Left: For each of the five curricula, autonomous learning was evaluated over 60 independent trials (one trial shown). Each trial in a curriculum consisted of two learning phases lasting 1,000 episodes for a total of 2,000 episodes. The reward function changed at the end of the first learning phase (with the exception of Curriculum 3, see Table \ref{['tab:curricula']}). Episode, Right: Each episode lasted 10 s and began de novo with the ball on the ground with the hand and fingertips suspended above it. In each episode, the PPO learning algorithm dynamically updates the agent's action (i.e., moving the fingers and hand) to increase the curriculum's reward.
Figure 2: The evolution of learning highlights the dynamic functional interaction between curriculum and tactile information. Manipulation performance during the last 10s of each episode noted: the percent of the time the ball is within the desired height range vs. number of complete rotations. Each point is the average of 60 independent trials. Arrows point in the direction of increasing episodes. Negative rotations were set to zero. Note that the choice of curriculum had a profound effect on learning for both tactile conditions ((A) No-tactile and (B) 3D-force ). Surprisingly, learning happened even in the absence of tactile information, and manipulation performance was not always better with 3D-force information. (C) An analogy of learning as a developmental trajectory from a pluripotent state based on experience (curriculum). This effect of curriculum (and tactile information, cf. A vs. B) affects both learning (path) and final performance (endpoint), and can be visualized as traversing a 'Waddington Landscape' (adapted from waddington1959evolutionary).
Figure 3: Performance across all curricula and both tactile information conditions. The joint distribution illustrates the performance during the final 10s episode of each of the 60 trial runs (showcasing the mean ball height (mm) versus the number of completed rotations). The color-coded cumulative reward for the last episode of each run (refer to equation (\ref{['eq:reward']})) corresponds to different curricula. Note that the final manipulation performance is represented by those points inside the green box defining the desired ball height (25 $\pm$ 4 mm).
Figure 4: Cumulative reward across all curricula and tactile information conditions. Boxplots, with median, across tactile conditions for 60 runs, every 250 episodes. Note learning tends to saturate early.
Figure 5: Final Performance for Lift (Left) and Rotation (Right) for both tactile conditions for four objects. The top row corresponds to the baseline object described in the main results. Violin plots show the distribution of Lift and Rotation at the end of learning (i.e., the last 10 seconds of the 2,000th episode) for all 60 trials. Lift is described as a distance from the desired height (the green box shows the distance from the desired height range $\pm$ 4 mm) and Rotation as the number of completed rotations for both tactile conditions, No-tactile and 3D-force.
...and 8 more figures

Curriculum Is More Influential Than Haptic Information During Reinforcement Learning of Object Manipulation Against Gravity

TL;DR

Abstract

Curriculum Is More Influential Than Haptic Information During Reinforcement Learning of Object Manipulation Against Gravity

Authors

TL;DR

Abstract

Table of Contents

Figures (13)