RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals
Yuzheng Gao, Yuxing Long, Lei Kang, Yuchong Guo, Ziyan Yu, Shangqing Mao, Jiyao Zhang, Ruihai Wu, Dongjiang Li, Hui Shen, Hao Dong
TL;DR
This work introduces RealAppliance, a dataset of 100 high-fidelity appliance assets whose physical, electronic, and program logic are aligned with real manuals, addressing prior gaps in realism and manual-grounded operation. It additionally presents RealAppliance-Bench, a multimodal and embodied planning benchmark spanning manual understanding, part grounding, open-loop planning, and closed-loop adjustment. Through extensive evaluations of state-of-the-art multimodal large language models and embodied planners, the authors reveal significant challenges in manual grounding and plan robustness, while highlighting the need for enhanced document understanding and fine-grained visual reasoning. The RealAppliance platform offers a realistic testbed for advancing appliance manipulation planning and has potential for broader use in data collection for low-level manipulation policies and standardized benchmarks.
Abstract
Existing appliance assets suffer from poor rendering, incomplete mechanisms, and misalignment with manuals, leading to simulation-reality gaps that hinder appliance manipulation development. In this work, we introduce the RealAppliance dataset, comprising 100 high-fidelity appliances with complete physical, electronic mechanisms, and program logic aligned with their manuals. Based on these assets, we propose the RealAppliance-Bench benchmark, which evaluates multimodal large language models and embodied manipulation planning models across key tasks in appliance manipulation planning: manual page retrieval, appliance part grounding, open-loop manipulation planning, and closed-loop planning adjustment. Our analysis of model performances on RealAppliance-Bench provides insights for advancing appliance manipulation research
