Table of Contents
Fetching ...

Harmonic Mobile Manipulation

Ruihan Yang, Yejin Kim, Rose Hendrix, Aniruddha Kembhavi, Xiaolong Wang, Kiana Ehsani

TL;DR

Harmonic Mobile Manipulation (HarmonicMM) addresses the challenge of coordinating navigation and manipulation for complex daily tasks in household environments. It proposes an end-to-end reinforcement learning approach that jointly optimizes base movement and arm actions using RGB vision from two views plus proprioception, trained in photorealistic ProcTHOR simulations and transferred to a real apartment without fine-tuning. The paper introduces the Daily Mobile Manipulation Task Suite (including Opening Door, Cleaning Table, and Opening Fridge) and demonstrates that HarmonicMM outperforms two-stage baselines in simulation and achieves meaningful real-world success rates, with ablations showing the importance of multi-view perception and pretrained visual encoders. The work highlights practical implications for indoor robot deployment by enabling robust, vision-based, end-to-end control, while acknowledging limitations related to physical capabilities and proposing future extensions to more dynamic tasks and broader environments.

Abstract

Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. To address this challenge, we introduce, HarmonicMM, an end-to-end learning method that optimizes both navigation and manipulation, showing notable improvement over existing techniques in everyday tasks. This approach is validated in simulated and real-world environments and adapts to novel unseen settings without additional tuning. Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation in a real unseen apartment, demonstrating the potential for practical indoor robot deployment in daily life. More results are on our project site: https://rchalyang.github.io/HarmonicMM/

Harmonic Mobile Manipulation

TL;DR

Harmonic Mobile Manipulation (HarmonicMM) addresses the challenge of coordinating navigation and manipulation for complex daily tasks in household environments. It proposes an end-to-end reinforcement learning approach that jointly optimizes base movement and arm actions using RGB vision from two views plus proprioception, trained in photorealistic ProcTHOR simulations and transferred to a real apartment without fine-tuning. The paper introduces the Daily Mobile Manipulation Task Suite (including Opening Door, Cleaning Table, and Opening Fridge) and demonstrates that HarmonicMM outperforms two-stage baselines in simulation and achieves meaningful real-world success rates, with ablations showing the importance of multi-view perception and pretrained visual encoders. The work highlights practical implications for indoor robot deployment by enabling robust, vision-based, end-to-end control, while acknowledging limitations related to physical capabilities and proposing future extensions to more dynamic tasks and broader environments.

Abstract

Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. To address this challenge, we introduce, HarmonicMM, an end-to-end learning method that optimizes both navigation and manipulation, showing notable improvement over existing techniques in everyday tasks. This approach is validated in simulated and real-world environments and adapts to novel unseen settings without additional tuning. Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation in a real unseen apartment, demonstrating the potential for practical indoor robot deployment in daily life. More results are on our project site: https://rchalyang.github.io/HarmonicMM/
Paper Structure (18 sections, 5 equations, 6 figures, 4 tables)

This paper contains 18 sections, 5 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Harmonic Mobile Manipulation: In this work, we address diverse mobile manipulation tasks integral to human's daily life. Trained in a photo-realistic simulation, Our controller effectively accomplishes tasks through harmonious mobile manipulation in a real-world apartment featuring a novel layout, without any fine-tuning or adaptation.
  • Figure 2: HarmonicMM Network Architecture (Left): Our HarmonicMM controller takes robot proprioception and multi-view visual observations as input and output navigation and manipulation commands at the same time. Real Visual Observations (Right): Our robot is shown on the Left and the observations from Nav Cam and Manip Cam are shown on the Top Right and Bottom Right respectively.
  • Figure 3: Our controller controls all DOF of the robot at every step.
  • Figure 4: Real World: We deployed the learned controller in a real apartment with a novel layout. Each row shows a single trajectory (from left to right) corresponding to Opening Door (Pull), Opening Door (Pull), Opening Door (Push), and Cleaning Table, respectively.
  • Figure 5: Cleaning Table in Simulation
  • ...and 1 more figures