HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)
Volodymyr Kuzma, Vladyslav Humennyy, Ruslan Partsey
TL;DR
This work tackles the Open Vocabulary Mobile Manipulation (OVMM) challenge by enhancing the RL baseline with a retrained perception stack (YOLOv8, MobileSAM, Detic), a revised place skill reward, and a more robust high-level heuristic. The agent achieves substantial gains on validation and the Test Standard split, culminating in 3rd place in the real-world phase, while still facing modest absolute success rates. Key contributions include a dual-Detic + YOLOv8-SAM perception pipeline, a place-skill reward redesign, and a high-level policy tweak that loops until a successful pick, all validated in Habitat-based virtual experiments and real-world tests. The results highlight the critical role of perception quality and reward shaping for place and navigation, and point to promising directions such as object tracking, smarter policy learning, and a world-representation framework to improve sim-to-real transfer and long-horizon planning.
Abstract
We report an improvements to NeurIPS 2023 HomeRobot: Open Vocabulary Mobile Manipulation (OVMM) Challenge reinforcement learning baseline. More specifically, we propose more accurate semantic segmentation module, along with better place skill policy, and high-level heuristic that outperforms the baseline by 2.4% of overall success rate (sevenfold improvement) and 8.2% of partial success rate (1.75 times improvement) on Test Standard split of the challenge dataset. With aforementioned enhancements incorporated our agent scored 3rd place in the challenge on both simulation and real-world stages.
