PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

Yanjia Huang; Renjie Li; Zhengzhong Tu

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

Yanjia Huang, Renjie Li, Zhengzhong Tu

TL;DR

Dexterous robotic piano playing requires simultaneous precision and musical expressiveness. PANDORA introduces a diffusion-based policy learning framework with a conditional U-Net and FiLM conditioning to generate smooth, high-dimensional action trajectories via a DDIM denoising process, starting from $x_T \sim \mathcal{N}(0,I)$ over $T=100$ steps; a residual inverse-kinematics refinement further improves fine-grained control. The framework couples this with a composite reward that includes task accuracy, audio fidelity, style mimicry, and an LLM-based oracle for semantic feedback, with hand-specific modulation to respect left-right roles. Empirical results on the ROBOPIANIST benchmark show state-of-the-art precision and expressiveness, with ablations confirming the critical contributions of diffusion denoising and semantic feedback; residual learning and semantic guidance enable more natural finger trajectories across diverse styles. The work advances dexterous manipulation by integrating advanced generative modeling and semantic evaluation, with potential impact on real-world expressive robotic manipulation and multi-instrument scenarios.

Abstract

We present PANDORA, a novel diffusion-based policy learning framework designed specifically for dexterous robotic piano performance. Our approach employs a conditional U-Net architecture enhanced with FiLM-based global conditioning, which iteratively denoises noisy action sequences into smooth, high-dimensional trajectories. To achieve precise key execution coupled with expressive musical performance, we design a composite reward function that integrates task-specific accuracy, audio fidelity, and high-level semantic feedback from a large language model (LLM) oracle. The LLM oracle assesses musical expressiveness and stylistic nuances, enabling dynamic, hand-specific reward adjustments. Further augmented by a residual inverse-kinematics refinement policy, PANDORA achieves state-of-the-art performance in the ROBOPIANIST environment, significantly outperforming baselines in both precision and expressiveness. Ablation studies validate the critical contributions of diffusion-based denoising and LLM-driven semantic feedback in enhancing robotic musicianship. Videos available at: https://taco-group.github.io/PANDORA

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

TL;DR

Abstract

PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)