HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)

Volodymyr Kuzma; Vladyslav Humennyy; Ruslan Partsey

HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)

Volodymyr Kuzma, Vladyslav Humennyy, Ruslan Partsey

TL;DR

This work tackles the Open Vocabulary Mobile Manipulation (OVMM) challenge by enhancing the RL baseline with a retrained perception stack (YOLOv8, MobileSAM, Detic), a revised place skill reward, and a more robust high-level heuristic. The agent achieves substantial gains on validation and the Test Standard split, culminating in 3rd place in the real-world phase, while still facing modest absolute success rates. Key contributions include a dual-Detic + YOLOv8-SAM perception pipeline, a place-skill reward redesign, and a high-level policy tweak that loops until a successful pick, all validated in Habitat-based virtual experiments and real-world tests. The results highlight the critical role of perception quality and reward shaping for place and navigation, and point to promising directions such as object tracking, smarter policy learning, and a world-representation framework to improve sim-to-real transfer and long-horizon planning.

Abstract

We report an improvements to NeurIPS 2023 HomeRobot: Open Vocabulary Mobile Manipulation (OVMM) Challenge reinforcement learning baseline. More specifically, we propose more accurate semantic segmentation module, along with better place skill policy, and high-level heuristic that outperforms the baseline by 2.4% of overall success rate (sevenfold improvement) and 8.2% of partial success rate (1.75 times improvement) on Test Standard split of the challenge dataset. With aforementioned enhancements incorporated our agent scored 3rd place in the challenge on both simulation and real-world stages.

HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)

TL;DR

Abstract

Paper Structure (33 sections, 6 figures, 3 tables)

This paper contains 33 sections, 6 figures, 3 tables.

Introduction
Motivation
Report Structure
OVMM Task
Navigation to object
Pick (Gaze)
Navigation to receptacle
Place
OVMM Challenge
Virtual Phase
Real-World Phase
Exploratory Analysis
Perception Impact
Place Skill Bottleneck
Our agent
...and 18 more sections

Figures (6)

Figure 1: Example of goal object (hat) on start receptacle (cabinet).
Figure 2: Skill relative success rates.
Figure 3: Original high-level policy.
Figure 4: Architecture of a low-level policy.
Figure 5: Semantic segmentation pipeline.
...and 1 more figures

HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)

TL;DR

Abstract

HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)

Authors

TL;DR

Abstract

Table of Contents

Figures (6)