Table of Contents
Fetching ...

Learning to Play Piano in the Real World

Yves-Simon Zeulner, Sandeep Selvaraj, Roberto Calandra

TL;DR

This paper demonstrates a proof-of-concept for learning to play piano in the real world using a Sim2Real approach. A reinforcement learning policy is trained in simulation with domain randomization and deployed onto a real multi-finger robotic hand to play on a 49-key MIDI keyboard. Key contributions include a reward redesign that removes fingering annotations, the introduction of hybrid execution to bridge sim-to-real gaps, and an open-source codebase. The results show that the hybrid execution mode can achieve strong real-world performance across multiple pieces, validating piano playing as a meaningful benchmark for dexterous manipulation and guiding future work toward more generalizable, real-world robotic manipulation systems.

Abstract

Towards the grand challenge of achieving human-level manipulation in robots, playing piano is a compelling testbed that requires strategic, precise, and flowing movements. Over the years, several works demonstrated hand-designed controllers on real world piano playing, while other works evaluated robot learning approaches on simulated piano scenarios. In this paper, we develop the first piano playing robotic system that makes use of learning approaches while also being deployed on a real world dexterous robot. Specifically, we make use of Sim2Real to train a policy in simulation using reinforcement learning before deploying the learned policy on a real world dexterous robot. In our experiments, we thoroughly evaluate the interplay between domain randomization and the accuracy of the dynamics model used in simulation. Moreover, we evaluate the robot's performance across multiple songs with varying complexity to study the generalization of our learned policy. By providing a proof-of-concept of learning to play piano in the real world, we want to encourage the community to adopt piano playing as a compelling benchmark towards human-level manipulation. We open-source our code and show additional videos at https://lasr.org/research/learning-to-play-piano .

Learning to Play Piano in the Real World

TL;DR

This paper demonstrates a proof-of-concept for learning to play piano in the real world using a Sim2Real approach. A reinforcement learning policy is trained in simulation with domain randomization and deployed onto a real multi-finger robotic hand to play on a 49-key MIDI keyboard. Key contributions include a reward redesign that removes fingering annotations, the introduction of hybrid execution to bridge sim-to-real gaps, and an open-source codebase. The results show that the hybrid execution mode can achieve strong real-world performance across multiple pieces, validating piano playing as a meaningful benchmark for dexterous manipulation and guiding future work toward more generalizable, real-world robotic manipulation systems.

Abstract

Towards the grand challenge of achieving human-level manipulation in robots, playing piano is a compelling testbed that requires strategic, precise, and flowing movements. Over the years, several works demonstrated hand-designed controllers on real world piano playing, while other works evaluated robot learning approaches on simulated piano scenarios. In this paper, we develop the first piano playing robotic system that makes use of learning approaches while also being deployed on a real world dexterous robot. Specifically, we make use of Sim2Real to train a policy in simulation using reinforcement learning before deploying the learned policy on a real world dexterous robot. In our experiments, we thoroughly evaluate the interplay between domain randomization and the accuracy of the dynamics model used in simulation. Moreover, we evaluate the robot's performance across multiple songs with varying complexity to study the generalization of our learned policy. By providing a proof-of-concept of learning to play piano in the real world, we want to encourage the community to adopt piano playing as a compelling benchmark towards human-level manipulation. We open-source our code and show additional videos at https://lasr.org/research/learning-to-play-piano .

Paper Structure

This paper contains 10 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: In this work, we demonstrate a proof-of-concept for learning to play piano with a real world robot. To achieve this, we employed a multi-finger robot hand and a Sim2Real approach. Experimental results show that the robot can learn to play several simple pieces successfully, after training exclusively in simulation.
  • Figure 2: Comparison of the simulated training environment to the real world. The real world environment consists of a multi-finger Wonik's Allegro hand mounted on a Ufactory xArm7 robot arm and an M-Audio Keystation 49e MIDI keyboard. The fingertips of the Allegro hand are replaced with thinner 3D-printed fingertips that fit the dimensions of the piano to allow pressing a single key at a time.
  • Figure 3: The diagram compares the three execution modes: A) In joint mirroring, the whole observation space is obtained from the simulated environment. B) In hybrid execution, only the pressed keys are based on the real world, while everything else is simulated. C) In real world execution, all observations are based on the real world.
  • Figure 4: As a reference is in both diagrams the F1 score reached in the simulation is provided. The results show that, with our setup, hybrid execution is the best-performing execution mode. This indicates that the simulation does not significantly diverge from the real world, which enables us to improve upon the performance of the execution mode real world execution. The results also show the potential of hybrid execution in settings where it is hard to achieve a model robust enough for a complete real world execution.
  • Figure 5: The more DR is applied, the more demanding the simulation environment becomes. This leads to a drop in performance, as observed in Fig. \ref{['fig:exp:dr:sim']}. However, DR leads to a more robust model, which can be observed in Fig. \ref{['fig:exp:dr:rw']}. With too much DR, the model is not able to play piano successfully, neither in simulation nor in the real world.