Table of Contents
Fetching ...

From Vocal Instructions to Household Tasks: The Inria Tiago++ in the euROBIN Service Robots Coopetition

Fabio Amadio, Clemente Donoso, Dionis Totsila, Raphael Lorenzo, Quentin Rouxel, Olivier Rochel, Enrico Mingo Hoffman, Jean-Baptiste Mouret, Serena Ivaldi

TL;DR

The paper addresses the challenge of translating natural language voice instructions into executable household tasks by service robots. It introduces an integrated Tiago++ platform combining a whole-body control stack with an LLM-based instruction understanding and planning pipeline, enabling autonomous and teleoperated kitchen tasks. Key contributions include open-sourced system integration, custom teleoperation devices, and a JSON-to-FSM planning approach demonstrated within the euROBIN coopetition. The work shows practical viability of voice-driven service robotics while noting limitations in perception robustness and generalization, with planned improvements such as advanced object tracking and ROS2 migration to scale to more environments.

Abstract

This paper describes the Inria team's integrated robotics system used in the 1st euROBIN coopetition, during which service robots performed voice-activated household tasks in a kitchen setting.The team developed a modified Tiago++ platform that leverages a whole-body control stack for autonomous and teleoperated modes, and an LLM-based pipeline for instruction understanding and task planning. The key contributions (opens-sourced) are the integration of these components and the design of custom teleoperation devices, addressing practical challenges in the deployment of service robots.

From Vocal Instructions to Household Tasks: The Inria Tiago++ in the euROBIN Service Robots Coopetition

TL;DR

The paper addresses the challenge of translating natural language voice instructions into executable household tasks by service robots. It introduces an integrated Tiago++ platform combining a whole-body control stack with an LLM-based instruction understanding and planning pipeline, enabling autonomous and teleoperated kitchen tasks. Key contributions include open-sourced system integration, custom teleoperation devices, and a JSON-to-FSM planning approach demonstrated within the euROBIN coopetition. The work shows practical viability of voice-driven service robotics while noting limitations in perception robustness and generalization, with planned improvements such as advanced object tracking and ROS2 migration to scale to more environments.

Abstract

This paper describes the Inria team's integrated robotics system used in the 1st euROBIN coopetition, during which service robots performed voice-activated household tasks in a kitchen setting.The team developed a modified Tiago++ platform that leverages a whole-body control stack for autonomous and teleoperated modes, and an LLM-based pipeline for instruction understanding and task planning. The key contributions (opens-sourced) are the integration of these components and the design of custom teleoperation devices, addressing practical challenges in the deployment of service robots.

Paper Structure

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: System overview: our LLM-based planner understands voice commands and generate a plan (in JSON format) that is used to assemble a Finite State Machine (FSM) for carrying out the instruction. Teleoperation is used both to record expert demonstrations, and to intervene in case of emergency or failure.
  • Figure 2: General overview of our system, based on a dual-arm Tiago++ robot with an omnidirectional mobile base. Each block includes its connected peripherals, positioned above it. The robot is equipped with three RGB-D cameras (one fixed and two mounted on the grippers) and three webcams for teleoperation. A laptop mounted on the robot manages the WBC stack (Sec. \ref{['subsec:cartesio']}), RGB-D camera processing (Sec. \ref{['subsec:tags-detector']}), plan execution (Sec. \ref{['subsec:deployment']}), navigation (Sec. \ref{['subsec:navigation']}) and human pose tracking (Sec. \ref{['subsec:human-tracking']}). Additionally, a Jetson Nano on the robot streams webcam footage to the teleoperation station and renders interactive visuals on a 7-inch screen mounted on the robot. Computationally intensive tasks, such as LLM-based plan generation (Sec. \ref{['subsec:speech']}), are offloaded to a remote machine equipped with two GPUs. Finally, the Teleoperation Station (Sec. \ref{['subsec:teleop']}) serves as a platform for (a) collecting demonstrations and (b) controlling the robot during failures or emergencies. We employed ROS Noetic middleware to integrate the different units (all running in dedicated Docker containers), connecting them to the Tiago++'s internal PC (where the roscore is running).
  • Figure 3: Scheme of our QP-based WBC (cf. Sec. \ref{['subsec:cartesio']}).
  • Figure 4: LLM-based plan generation overview (cf. Sec. \ref{['subsec:speech']}).