RoboCup@Home 2024 OPL Winner NimbRo: Anthropomorphic Service Robots using Foundation Models for Perception and Planning
Raphael Memmesheimer, Jan Nogga, Bastian Pätzold, Evgenii Kruzhkov, Simon Bultmann, Michael Schreiber, Jonas Bode, Bertan Karacora, Juhui Park, Alena Savinykh, Sven Behnke
TL;DR
The paper presents the NimbRo@Home system for RoboCup@Home 2024 in the Open Platform League, highlighting a shift toward open-vocabulary perception and foundation-model–driven task planning. The hardware combines an enhanced TIAGo++ platform with onboard high-performance compute, multiple sensors, and a two-robot setup to improve robustness and redundancy. Software integrates SLAM-based mapping, open- and closed-vocabulary object perception (mmGrounding-DINO, Grounding-DINO, MaskDINO), 3D grasp planning (cuRobo-based), robust speech processing, and GPT-4o–driven task planning with function calling, enabling end-to-end execution of complex commands. Results show top performance across Stage 1 and Stage 2, a strong final demonstration on dinner preparation, and recognition for best-in-restaurant tasks, demonstrating that open-vocabulary perception plus LLM planning can enable robots to grasp unseen objects and operate effectively in dynamic domestic environments.
Abstract
We present the approaches and contributions of the winning team NimbRo@Home at the RoboCup@Home 2024 competition in the Open Platform League held in Eindhoven, NL. Further, we describe our hardware setup and give an overview of the results for the task stages and the final demonstration. For this year's competition, we put a special emphasis on open-vocabulary object segmentation and grasping approaches that overcome the labeling overhead of supervised vision approaches, commonly used in RoboCup@Home. We successfully demonstrated that we can segment and grasp non-labeled objects by text descriptions. Further, we extensively employed LLMs for natural language understanding and task planning. Throughout the competition, our approaches showed robustness and generalization capabilities. A video of our performance can be found online.
