Table of Contents
Fetching ...

HIPer: A Human-Inspired Scene Perception Model for Multifunctional Mobile Robots

Florenz Graf, Jochen Lindermayr, Birgit Graf, Werner Kraus, Marco F. Huber

TL;DR

A human-inspired scene perception model is presented to minimize the gap between human and robotic capabilities and takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation.

Abstract

Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.

HIPer: A Human-Inspired Scene Perception Model for Multifunctional Mobile Robots

TL;DR

A human-inspired scene perception model is presented to minimize the gap between human and robotic capabilities and takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation.

Abstract

Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.
Paper Structure (34 sections, 9 equations, 12 figures, 4 tables, 5 algorithms)

This paper contains 34 sections, 9 equations, 12 figures, 4 tables, 5 algorithms.

Figures (12)

  • Figure 1: The provides a holistic scene perception, empowering mobile robots to take over diverse tasks in open-world settings.
  • Figure 2: HIPer model design inspired by human perception consisting of a triplet split of observations $o$, based on sensory input $X$, its aggregation into instances $i$, and spatio-temporal analyses $a$. The obtained scene knowledge $Y$ is accessible for decision-making and planning.
  • Figure 3: Schematic overview of the HIPer model with its newly developed (blue) and reused (gray) components. A triplet split separates recognition, consisting of a background and foreground pipeline to obtain observations $o$ and instances $i$, representation of this scene knowledge in multiple layers, and its interpretation for long-term scene analyses $a$.
  • Figure 4: For evaluation selected publicly available virtual environments from Amazon AWS_Gazebo. Walking people are spawned to enrich dynamics.
  • Figure 5: Real-world office setup of the robot MobiKa (a), comprising a Kinect Azure RGB-D camera and notebook mounting. The experiments take place in an office building (b) comprising a central stairwell and a lab connected by corridors.
  • ...and 7 more figures