Table of Contents
Fetching ...

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler

TL;DR

The Robot Piano 1 Million (RP1M) dataset is introduced, containing bi-manual robot piano playing motion data of more than one million trajectories, and finger placements are formulated as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs.

Abstract

It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.

RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

TL;DR

The Robot Piano 1 Million (RP1M) dataset is introduced, containing bi-manual robot piano playing motion data of more than one million trajectories, and finger placements are formulated as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs.

Abstract

It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.
Paper Structure (23 sections, 3 equations, 5 figures, 10 tables)

This paper contains 23 sections, 3 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Overview of RP1M. (Left) RP1M is a large-scale motion dataset for piano playing with bi-manual dexterous robot hands. The dataset includes $\sim$1M expert trajectories collected by $\sim$2k RL specialist agents. (Right) To collect a diverse motion dataset of playing sheet music available on the Internet, we lift the requirement of human-annotated fingering by formulating the finger placement as an optimal transport problem such that the robot hands play piano in an energy-efficient way.
  • Figure 2: Comparison of the RL performance with our OT fingering, human-annotated fingering, and no fingering. Our method matches the performance of RoboPianist-RL, which is trained with human fingering. We also outperforms the baseline without any fingering information by a large margin. The plots show the mean over 3 random seeds and the shaded areas represent the 95% confidence interval.
  • Figure 3: Comparison of fingering discovered by the agent itself and human annotations.
  • Figure 4: Statistics of our RP1M dataset. (Top) Histogram of pressed keys in our RP1M dataset. (Bottom Left) Distribution of the number of active keys over all time steps. (Bottom Right) Distribution of F1 scores in our dataset.
  • Figure 5: Comparison of the RL performance between DroQ and PPO with the MJX implementation of the RoboPianist environment. PPO+MJX is faster to run but has a worse performance than DroQ. We use DroQ with the CPU-version RoboPianist environment when training our RL agents.