Table of Contents
Fetching ...

RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

Raphael Memmesheimer, Jan Nogga, Bastian Pätzold, Evgenii Kruzhkov, Simon Bultmann, Michael Schreiber, Jonas Bode, Bertan Karacora, Juhui Park, Alena Savinykh, Sven Behnke

TL;DR

The paper presents the NimbRo@Home system for RoboCup@Home 2024 in the Open Platform League, highlighting a shift toward open-vocabulary perception and foundation-model–driven task planning. The hardware combines an enhanced TIAGo++ platform with onboard high-performance compute, multiple sensors, and a two-robot setup to improve robustness and redundancy. Software integrates SLAM-based mapping, open- and closed-vocabulary object perception (mmGrounding-DINO, Grounding-DINO, MaskDINO), 3D grasp planning (cuRobo-based), robust speech processing, and GPT-4o–driven task planning with function calling, enabling end-to-end execution of complex commands. Results show top performance across Stage 1 and Stage 2, a strong final demonstration on dinner preparation, and recognition for best-in-restaurant tasks, demonstrating that open-vocabulary perception plus LLM planning can enable robots to grasp unseen objects and operate effectively in dynamic domestic environments.

Abstract

We present the approaches and contributions of the winning team NimbRo@Home at the RoboCup@Home 2024 competition in the Open Platform League held in Eindhoven, NL. Further, we describe our hardware setup and give an overview of the results for the task stages and the final demonstration. For this year's competition, we put a special emphasis on open-vocabulary object segmentation and grasping approaches that overcome the labeling overhead of supervised vision approaches, commonly used in RoboCup@Home. We successfully demonstrated that we can segment and grasp non-labeled objects by text descriptions. Further, we extensively employed LLMs for natural language understanding and task planning. Throughout the competition, our approaches showed robustness and generalization capabilities. A video of our performance can be found online.

RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning

TL;DR

The paper presents the NimbRo@Home system for RoboCup@Home 2024 in the Open Platform League, highlighting a shift toward open-vocabulary perception and foundation-model–driven task planning. The hardware combines an enhanced TIAGo++ platform with onboard high-performance compute, multiple sensors, and a two-robot setup to improve robustness and redundancy. Software integrates SLAM-based mapping, open- and closed-vocabulary object perception (mmGrounding-DINO, Grounding-DINO, MaskDINO), 3D grasp planning (cuRobo-based), robust speech processing, and GPT-4o–driven task planning with function calling, enabling end-to-end execution of complex commands. Results show top performance across Stage 1 and Stage 2, a strong final demonstration on dinner preparation, and recognition for best-in-restaurant tasks, demonstrating that open-vocabulary perception plus LLM planning can enable robots to grasp unseen objects and operate effectively in dynamic domestic environments.

Abstract

We present the approaches and contributions of the winning team NimbRo@Home at the RoboCup@Home 2024 competition in the Open Platform League held in Eindhoven, NL. Further, we describe our hardware setup and give an overview of the results for the task stages and the final demonstration. For this year's competition, we put a special emphasis on open-vocabulary object segmentation and grasping approaches that overcome the labeling overhead of supervised vision approaches, commonly used in RoboCup@Home. We successfully demonstrated that we can segment and grasp non-labeled objects by text descriptions. Further, we extensively employed LLMs for natural language understanding and task planning. Throughout the competition, our approaches showed robustness and generalization capabilities. A video of our performance can be found online.

Paper Structure

This paper contains 16 sections, 8 figures.

Figures (8)

  • Figure 1: The NimbRo@Home team at RoboCup 2024 in Eindhoven, NL.
  • Figure 2: Enhanced TIAGo++ omnidirectional robot platform.
  • Figure 3: Software modules used in the competition. (a) Map including location markers and annotated regions. (b) Person detection using YOLO V8, body pose estimation, action and face recognition. (c) Projected object segments using mmGrounding-DINO.
  • Figure 4: Object perception pipeline including annotation, curation, and backbone models. We annotate data semi-automatically using CVAT and Segment Anything. This data is then curated using FiftyOne and combined with external datasets. The curated data is used to fine-tune YOLO and MaskDINO models. Additionally, we employ NanoSAM to recover segments from YOLO detections.
  • Figure 5: 3D object perception and grasping. (a) If available, 3D models can be registered to partial point clouds of all detected objects. (b) The grasp proposals minimizing the grasping cost function for approaching a bottle of mustard lying on a surface.
  • ...and 3 more figures